NASA CR 114278 


CODING SYSTEMS STUDY FOR 
HIGH DATA RATE TELEMETRY LINKS 

By: K.S. Gilhousen 

J.A. Heller 
I.M. Jacobs 
A.J. Viterbi 


January 1971 


Distribution of this report is provided in the interest of 
information exchange. Responsibility for the contents 
resides in the author or organization that prepared it. 


Prepared under Contract No. NAS2-6024 by 
LINKABIT CORPORATION 
San Diego, California 


for 

AMES RESEARCH CENTER 


NATIONAL AERONAUTICS AND SPACE ADMINISTRATION 




TABLE OF CONTENTS 


SECTION TITLE PAGE 

1.0 INTRODUCTION 1 

2.0 VITERBI DECODER STUDY 3 

2.1 Introduction and Fundamentals 3 

2.1.1 Code Representation 3 

2.1.2 The Viterbi Decoding Algorithm for 
the Binary Symmetric Channel (Hard 
Decision Outputs of a Gaussian 

Channel) 8 

2.1.3 Distance Properties of Convolutional 

Codes 11 

2.1.4 Generalization to Arbitrary Convolu- 
tional Codes 13 

2.1.5 Systematic, Nonsvstematic , and Cat- 
astrophic Codes 15 

2.1.6 Generalization of Viterbi Decoder to 
the Additive White Gaussian Noise 

Channel 18 

2.1.7 Metric Quantization, Path Memory 

Truncation, and Other System Con- 
siderations 23 

2.2 Rate 1/2 Convolutional Codes and Viterbi Decoders 26 

2.2.1 Good Convolutional Codes 27 

2.2.2 Numerical Code Performance Bound 29 

2.2.3 Viterbi Decoder Simulation Program 35 

2.2.4 Simulation and Numerical Performance 

Data . 36 

2. 2. 4.1 General Performance Results...... 36 

2. 2. 4. 2 Receiver Quantization 41 

2 . 2 . 4 . 3 Path Memory 42 

2. 2. 4. 4 Decoder Output Selection. 45 

2.2.5 Code Synchronization and Channel 

Reliability 46 

2. 2. 5.1 Node Synchronization and 

Phase Ambiguity Resolution.. 46 

2. 2. 5. 2 Transparent Codes 60 

2.2. 5. 3 Channel Reliability Information.. 63 

2.2.6 Sensitivity to AGC Inaccuracy 64 

2.3 Other Code Rates 64 

2.3.1 Description of Code Search Program 66 

2.3.2 Good Rate 1/3, 2/3, and 3/4 Codes 69 

2.3.3 Simulation and Numerical Performance 
Data 


// 


69 



TABLE OF CONTENTS (Continued) 


SECTION TITLE PAGE 

2.3.4 Comparison With Rate 1/2 Code 76 

2.4 Viterbi Decoder Implementation 79 

2.4.1 Review of Decoder Algorithm 79 

2.4.2 Metric Compression... 34 

2.4.3 Overflow Protection 39 

2.4.4 Storage of State Metrics 94 

2.4.5 Arithmetic Logical Section.. 96 

2. 4. 5.1 ECL Arithmetic-Logical Unit 99 

2. 4. 5.2 TTL ACS Unit IQ 4 

2.4.6 Decision Memory and Outout Selection 197 

2. 4. 6.1 ECL Memory Output Section... 110 

2. 4. 6 . 2 TTL Memory Output Section m 

2.4.7 Synchronization Section 113 

2.4.8 Trade-Off Section 117 

2. 4. 8.1 Cost-Complexity Trade-Offs 117 

2. 4. 8 . 2 Cost vs. Constraint Length 119 

2.4. 8 . 3 Cost vs. Code Rate 120 

2. 4. 8 . 4 Cost vs. Quantization 121 

3.0 SEQUENTIAL DECODING 123 

3.1 Background ^23 

3.2 Hard Decision Decoder ^29 

3.2.1 Syndrome Sequential Decoder 12 9 

3.2.2 Algorithm Modifications 131 

3. 2. 2.1 Guess and Restart Overflow 

Strategy 131 

3. 2. 2. 2 Quick Threshold Loosening 132 

3.2.2. 3 Look Ahead Sequential Decoding... 141 

3. 2. 2. 4 Sequential Decoding with Side- 
ways Looks 14 4 

3.2.3 Decoder Undetected Error and Computational 

Performance 144 

3.2. 3.1 Code Selection 145 

3.2. 3.2 Decoder Parameters 146 

3.2. 3. 3 The Distribution of Decoding 

Computations 14 3 

3. 2. 3. 4 Measured Undetected Error Rates.. 143 

3.2.4 Real Time Sequential Decoder Simulation.. 152 

3.2.5 Erasures vs. Undetected Errors 157 

3.2.6 Systematic vs. Nonsystematic Codes. 157 

3.2.7 Code Synchronization and Channel Reli- 
ability Prediction 160 



TABLE OF CONTENTS (Continued) 


SECTION TITLE PAGE 

3.3 Soft Decision Sequential Decoding 160 

3.3.1 Syndrome Decoder 160 

3.3.2 Fano Algorithm Modifications 162 

3.3.3 Sensitivity to Incorrect AGC 162 

3.3.4 Comparisons of Soft and Hard Decision 

Sequential Decoders.. 16 3 

3.4 Sequential Decoder Implementation. 16 3 

3.4.1 40 Mbps Sequential Decoder 167 

3.4.2 Code Synchronization 171 

3.4.3 Input Buffer 172 

3.4.4 Received Information Bit Storage 174 

3.4.5 Syndrome Generator 176 

3.4.6 Decoder Memory 178 

3.4.7 CPU Buffer 181 

3.4.8 CPU 183 

3.4.9 Physical Description 188 

3.4.10 Modifications Required by Soft Decisions.. 189 

4.0 CODING FOR DATA OF VARYING SPEED AND ERROR RATE 

REQUIREMENTS 192 

4.1 Concatenated Coding 192 

4.2 Lengthened Symbol Times for Low-Rate Data 194 

4.3 Lower Rate Codes 195 

5.0 PREDECODING FOR A SEQUENTIAL DECODER: A HYBRID IMPLE- 
MENTATION FOR VERY LOW ERROR PROBABILITY 19 8 

5.1 Introduction..... 19 8 

5.2 Deletion Probabilities of Predecoder 201 

5.3 Computational Complexity of Sequential Decoding 

for Erroneously Predecoded Deleted Frames 202 

5.4 Overflow Probability of Hybrid Implementation 204 

5.5 System Analysis of Possible Hybrid Implementation 205 

5.6 Conclusions 208 

APPENDIX A 213 

APPENDIX B 218 

APPENDIX C 232 


'/Vi 




1.0 INTRODUCTION 


This report presents the results of a study of coding systems 
for high data rate links. The emphasis throughout is on convolu- 
tional codes. This is because high performance decoders exist 
for this class of codes, which are practical to implement at multi- 
megabit data rates. 

The bit error rate, in the 10" 3 to 10" 8 range, vs. E^/No 
performance of many coding system configurations has been studied, 
through simulation and analytical techniques. Special attention 
has been naid to the sensitivity of performance to decoder para- 
meters which affected complexity and cost significantly. These 
decoder parameters include 

a) Code constraint length 

b) Code rate (bandwidth expansion) 

c) Data rate 

d) Speed factor and buffer size (sequential decoding) 

e) Path memory length, metric representation, and decision 
output selection (Viterbi decoding) 

f) Receiver quantization. 

A technique for obtaining and maintaining node synchronization 
and resolving phase ambiguities has been devised and analyzed. 

This technique is quite simple to implement. Simulation indicates 
that performance is more than adequate for the systems under con- 
sideration. 
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The report is divided into four major sections, plus the 
introduction. Each section deals with a distinct type of decoder 
or combination of decoders. Section 2 is concerned with Viterbi 
decoders. The important tradeoffs between performance and com- 
plexity are discussed here. The section concludes with a dis- 
cussion of several methods of implementing Viterbi decoders at 
various data rates, and their relative complexity and cost. 

Section 3 treats sequential decoding. Several techniques 
which either simplify or improve the performance of the Fano 
algorithm are discussed and evaluated. The implementation sub- 
section emphasizes high speed rate 1/2, hard decision sequential 
decoding. 

Section 4 evaluates a simple scheme for providing different 
levels of coding for data with different error rate requirements. 

Finally, an interesting method of predecoding high rate 
received data using a Viterbi decoder, and shunting occasional 
difficult data to a sequential decoder, is discussed in Section 
5. This technique holds out the possibility of efficiently de- 
coding very high rate data using a relatively slow sequential 
decoder. 
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2.0 VITERBI DECODER STUDY 


2.1 Introduction and Fundamentals 


2.1.1 Code Representation . A convolutional encoder is a 
linear finite-state machine consisting of a K-stage shift regis- 
ter and n linear algebraic function generators. The input data, 
which is usually though not necessarily binary, is shifted along 
the register b bits at a time. An example with K=3, n=2, b=l 
is shown in Fig. 2.1. 


010001 ... 


001101010010. 

code sequence 



011010 ... 
data sequence 


Fig. 2.1 Convolutional Coder for K=3, n=2, b=l 

The binary input data and output code sequences are indicated on 
the diagram. The first three input bits, 0, 1, and 1, generate 
the code outputs 00, 11, and 01 respectively. We shall pursue 
this example to develop various representations of convolutional 
codes and their properties. The techniques thus developed will 
then be shown to generalize directly to any convolutional code. 
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If the first input bit is a zero the code symbols are those 
shown on the first upper branch, while if it is a one the output 
code symbols are those shown on the first lower branch. Similarly, 
if the second input bit is a zero we trace the tree diagram to the 
next upper branch, while if it is a one we trace the diagram down- 
ward. In this manner all thirty-two possible outputs for the 
first five inputs may be traced. 

From the diagram it also becomes clear that after the first 
three branches the structure becomes repetitive. In fact, we 
readily recognize that beyond the third branch the code symbols 
on branches emanating from the two nodes labelled "a" are iden- 
tical, and similarly for all correspondingly labelled pairs of 
nodes. The reason for this is obvious from examination of the 
encoder. As the fourth input bit enters the coder at the right, 
the first data bit falls off on the left end and no longer influ- 
ences the output code symbols. Consequently, the data sequences 
lOOxy... and OOOxy... generate the same code symbols after the 
third branch and, as is shown in the tree diagram, both nodes 
labelled "a" can be joined together. 

This leads to redrawing the tree diagram as shown in Figure 
2.3. This has been called a trellis diagram since a trellis is a 
tree-like structure with remerging branches. We adopt the 
convention here that code branches produced by a "zero" input 
bit are shown as solid lines and code branches produced by a 
"one" input bit are shown dashed. 
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00 00 00 a 00 a 00 



Figure 2.3 Trellis Code Representation for 
Coder of Figure 2.1. 

The completely repetitive structure of the trellis diagram 
suggests a further reduction in the representation of the* code 
to the state-diagram of Figure 2.4. The "states" of the state- 
diagram are labelled according to the nodes of the trellis 
diagram. However , since the states correspond merely to the last 
two input bits to the coder we may use these bits to denote the 
nodes or states of this diagram. 


6 - 


10 



Figure 2.4. State-Diagram Representation for 
Coder of Figure 2*1. 


We observe finally that the state-diagram can be drawn 
directly by observing the finite-state machine properties of the 
encoder and particularly the fact that a four-state directed 
graph can be used to represent uniquely the input-output relation 
of the eight-state machine. For the nodes represent the previous 
two bits while the present bit is indicated by the transition 
branch; for example , if the encoder (machine) contains Oil, 
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this is represented in the diagram by the transition from state 
b = 01 to state d =» 11 and the corresponding branch indicates the 
code symbol outputs 01. 

2.1.2 The Viterbi Decoding Algorithm for the Binary Sym- 
metric Channel (Hard Decision Outputs of a Gaussian Channel) . On 
a binary symmetric channel, errors which transform a channel code 
symbol 0 to 1 or 1 to 0 are assumed to occur independently from 
symbol to symbol with probability p. If all input (message) 
sequences are equally likely, the decoder which minimizes the 
overall error probability for any code, block or convolutional, 
is one which examines the error-corrupted received sequence 
yy . . .y.... and chooses the data sequence corresponding to the 

1 2 j 

transmitted code sequence XjX 2 . . .x.j . . . which is closest to the 
received sequence in the sense of Hamming distance; that is the 
transmitted sequence which differs from the received sequence in 
the minimum number of symbols. 

Referring first to the tree diagram, this implies that we 
should choose that path in the tree whose code sequence differs 
in the minimum number of symbols from the received sequence. 
However, recognizing that the transmitted code branches remerge 
continually, we may equally limit our choice to the possible paths 
in the trellis diagram of Figure 2.3. Examination of this dia- 
gram indicates that it is unnecessary to consider the entire 
received sequence (which conceivably could be thousands or 
millions of symbols in length) at one time in deciding upon the 
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most likely (minimum distance) transmitted sequence. In parti- 
cular, immediately after the third branch we may determine which 
of the two paths leading to node or state "a" is more likely to 
have been sent. For example, if 010001 is received, it is clear 
that this is at distance 2 from 000000 while it is at distance 
3 from 111011 and consequently we may exclude the lower path into 
node "a". For, no matter what the subsequent received symbols 
will be, they will effect the distances only over subsequent 
branches after these two paths have remerged and consequently in 
exactly the same way. The same can be said for pairs of paths 
merging at the other three nodes after the third branch. We 
shall refer to the minimum distance path of the two paths merging 
at a given node as the "survivor". Thus it is necessary only to 
remember which was the minimum distance path from the received 
sequence (or survivor) at each node, as well as the value of that 
minimum distance. This is necessary because at the next node 
level we must compare the two branches merging at each node level, 
which were survivors at the previous level for different nodes; 
e.g., the comparison at node "a" after the fourth branch is 
among the survivors of comparison at nodes "a" and "c" after 
the third branch. For example, if the received sequence over 
the first four branches is 01000111, the survivor at the third 
node level for node "a" is 000000 with distance 2 and at node 
"c" it is 110101, also with distance 2. In going from the third 
node level to the fourth the received sequence agrees precisely 
with the survivor from "c" but has distance 2 from the survivor 
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from "a". Hence the survivor at node "a" of the fourth level is 
the data sequence 1100 which produced the code sequence 11010111 
which is at (minimum) distance 2 from the received sequence. 

In this way, we may proceed through the received sequence 
and at each step preserve one surviving path and its distance 
from the received sequence, which is more generally called metric . 
The only difficulty which may arise is the possibility that in a 
given comparison between merging paths, the distances or metrics 
are identical. Then we may simply flip a coin as is done for 
block code words at equal distances from the received sequence. 

For even if we preserved both of the equally valid contenders, 
further received symbols would affect both metrics in exactly the 
same way and thus not further influence our choice. 

This decoding algorithm was first proposed by Viterbi (Ref. 8) 
in the more general context of arbitrary memoryless channels. 
Another description of the algorithm can be obtained from the 
state-diagram representation of Figure 2.4. Suppose we sought 
that path around the directed state-diagram, arriving at node 
"a" after the kth transition, whose code symbols are at a minimum 
distance from the received sequence. But clearly this minimum 
distance path to node "a" at time k can be only one of two can- 
didates: the minimum distance path to node "a" at time k-1 and 
the minimum distance path to node "c" at time k-1. The compari- 
son is performed by adding the new distance accumulated in the kth 
transition by each of these paths to their minimum distances 
(metrics) at time k-1. 
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It appears thus that the state-diagram also represents a 
system diagram for this decoder. With each node or state, we 
associate a storage register which remembers the minimum distance 
path into the state after each transition as well as a metric 
register which remembers its (minimum) distance from the received 
sequence. Furthermore, comparisons are made at each step be- 
tween the two paths which lead into each node. Thus four com- 
parators must also be provided. 

We defer the question of truncating the trellis and thereby 
making a final decision on all bits beyond L branches prior to 
the given branch until we have some additional properties of con- 
volutional codes. 

2.1.3 Distance Properties of Convolutional Codes . We con- 
tinue to pursue the example of Figure 2.1 for the sake of clarity; 
in the next section, we shall easily generalize results. It is 
well known that convolutional codes are group codes. Thus there 
is no loss in generality in computing the distance from the all 
zeros code word to all the other code words, for this set of dis- 
tances is the same as the set of distances from any specific code- 
word to all the others. 

For this purpose, we may again use either the trellis diagram 
or the state-diagram. We first of all redraw the trellis diagram 
in Figure 2.5 labelling the branches according to their distances 
from the all zeros path. Now consider all the paths that merge 
with the all zeros for the first time at some arbitrary node "j". 
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0 


0 


0 


0 


0 



Figure 2.5 Trellis Diagram Labelled with 
Distances from all Zeros Path 

It is seen from the diagram that of these paths, there will be 
just one path at distance 5 from the all zeros path and this 
diverged from it three branches back. Similarly there are two 
at distance 6 from it; one which diverged 4 branches back and the 
other which diverged 5 branches back, and so forth. We note also 
that the input bits for distance 5 path are 00.. 0100 and- thus 
differ in only one input bit from the all zeros, while the 
distance 6 paths are 00.. 01100 and 00.. 010100 and thus differs 
in 2 input bits from the all zeros path. The minimum distance, 
sometimes called the minimum " free " distance , among all paths 
is thus seen to be 5. This implies that any pair of channel 
errors can be corrected, for two errors will cause the received 
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sequence to be at distance 2 from the transmitted (correct) 
sequence but it will be at least at distance 3 from any other 
possible code sequence. It appears thus that the distance of 
all paths from the all zeros (or any arbitrary) path can be 
so determined from the trellis diagram. 

2.1.4 Generalization to Arbitrary Convolutional Codes . The 
generalization of these techniques to arbitrary binary-tree (b=l) 
convolutional codes is immediate. That is, a coder with a K 
stage shift register and n modulo-two adders will produce a trel- 

K- 

lis or state-diagram with 2 nodes or states and each branch 
will contain "n" code symbols. The rate of this code is then 


n code symbol 

The example pursued in the previous sections had rate R=l/2. The 

primary characteristic of the binary-tree codes is that only two 

branches exit from and enter each node. 

If rates other than 1/n are desired, we must make b>l, where 

b is the number of bits shifted into the register at one time. An 

example for K=2, b=2, n=3 and consequently rate R=2/3 is shown 

in Figure 2.6 and its state-diagram is shown in Figure 2.7. 

It differs from the binary-tree codes only in that each node 

is connected to four other nodes, and for general "b" , it will 

t) 

be connected to 2 nodes. Still all the preceding techniques 
including the trellis and state-diagram analysis are still 
applicable. It must be noted, however, that the minimum distance 
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decoder must make comparisons among all the paths entering 
each node at each level of the trellis and select one survivor 
out of four (or out of 2^ in general) . 

2.1.5 Systematic, Nonsystematic, and Catastrophic Convolu- 
tional Codes . The term systematic convolutional code refers 
to a code on each of whose branches one of the code symbols is 
just the data bit generating that branch. Thus a systematic 
coder will have its stages connected to only n-1 adders, the nth 
being replaced by a direct line from the first stage to the com- 
mutator. Figure 2.8 shows an R*l/2 systematic coder for K»3. 



Figure 2.8 Systematic Convolutional Coder 
for K**3, R-l/2 

It is well known that for group block codes, any nonsys- 
tematic code can be transformed into a systematic code which 
performs exactly as well. This is not the case for convolutional 
codes. The reason for this is that the performance of a code 
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on any channel depends largely upon the relative distance between 
codewords and particularly on the minimum free distance d, which 
is the minimum number of ones of any closed path through node 
"a". Eliminating one of the adders results in a reduction of "d" 
For example, the maximum minimum free distance systematic code 
for K=3 is that of Figure 2.8 and this has d=4, while the nonsys- 
tematic K=3 code of Figure 1.1 has minimum free distance d=5. 
Table 2.1 shows the maximum minimum free distance for systematic 
and nonsystematic codes for K=2 through 5. 


Maximum Minimum Free Distance 


K 

Systematic 

Nonsystematic 

2 

3 

3 

3 

4 

5 

4 

4 

6 

5 



5 

7 


Table 2.1 Comparison of Systematic and 
Nonsystematic R=l/2 Codes 

For large constraint lengths the results are even more widely 
separated. 

A catastrophic error is defined as the event that a finite 
number of channel symbol errors causes an infinite number of data 
bit errors to be decoded. Massey and Sain (Ref. 9 } have shown 
that a necessary and sufficient condition for a convolutional 
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code to produce catastrophic errors is that all of the adders 
have tap sequences, represented as polynomials, with a common 
factor . 

In terms of the state-diagram it is easily seen that catas- 
trophic errors can occur if, and only if, any closed loop path 
in the diagram has a zero weight (i.e., the exponent of D for 
the loop path is zero) . To illustrate this, we consider the 
example of Figure 2.9. 



Figure 2.9 Coder Displaying Catastrophic 
Error Propagation 

Assuming that the all zeros is the correct path, the incorrect 
path a b d d ... d c a has exactly 6 ones, no matter how many 
times we go around the self loop a. Thus, for a BSC, for example, 
four channel errors may cause us to choose this incorrect path 
or consequently make an arbitrarily large number of bit errors 
(equal to two plus the number of times the self loop is traversed) . 
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We observe also that for binary-tree (R=l/n) codes, if each 
adder of the coder has an even number of connections, then the 
self loop corresponding to the all ones (data) state will have 
zero weight and consequently the code will be catastrophic. 

The only advantage of a systematic code is that it can never 
be catastrophic, since each closed loop must contain at least 
one branch generated by a nonzero data bit and thus having a 
nonzero code symbol. Still, it can be shown that only a small 
fraction of nonsystematic codes is catastrophic (in fact, l/(2 n -l) 
for binary-tree R=l/n codes) . We note further that if catastrophic 
errors are ignored, nonsystematic codes with even smaller free 
distance than those of Table 2.1 exist. 

2.1.6 Generalization of Viterbi Decoder to the Additive 
White Gaussian Noise Channel . Figure 2.10 exhibits a communica- 
tion system employing a convolutional code. The convolutional 
encoder is precisely the device studied in the preceding sections. 


Data . 

Convolutional 

Code 

Sequence 

Encoder 

Sequence 

9 j f S 2 • • m * 


Xi , Xj * • *X j • # » 


— 

Memoryless 

Received 

Decoder 

Channel 

Sequence 

(Including 

Modem) 

Yi rv 2 • • • y 



Fig. 2.10 Communication System Employing 
Convolutional Codes 
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The data sequence is generally binary (aj=0 or 1) and the code 
sequence is divided into subsequences where x. represents the n 
code symbols generated just after the input bit a^ enters the 
coder, that is, the symbols of the jth branch. In terms of the 
example of Fig. 1.1, a 3=1 and X 3 = 01. The channel output or re- 
ceived sequence is similarly denoted, y. represents the n symbols 
received when the n code symbols of x^ were transmitted. This 
model includes the BSC wherein the are binary n-vectors 
each of whose symbols differs from the corresponding symbol of 


Xj with probability p and is identical to it with probability 

1 -p. 


For completely general channels it is well known that if all 
input data sequences are equally likely, the decoder which mini- 
mizes the error probability is one which compares the conditional 
probabilities, also called likelihood functions, P(y|x m ), where y 
is the overall received sequence and x m is one of the possible 
transmitted sequences, and decides in favor of the maximum. This 
is called a maximum likelihood decoder . The likelihood functions 
are given or computed from the specifications of the channel. 
Generally it is more convenient to compare the quantities 
InP (y | x m ), called the log-likelihood functions, and the result 

«v 1 Tl 1 - - - , 

is unaltered since the logarithm is a monotonic function of its 
(always positive) argument. It is easily shown that for the BSC 
maximizing the log-likelihood function is equivalent to mini- 
mizing the Hamming distance, as we have done in previous sections. 

We now consider the practical channel of primary interest: 
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namely, the additive white Gaussian noise (AWGN) channel with 
biphase PSK modulation. The modulator and optimum demodulator 
(correlator or integrate-and dump filter) for this channel are 
shown in Fig. 2.11. 


x ll x 12 * * * x ln x 21 x 22 * * * x 2n* * * 



n(t) white Gaussian noise 


Fig. 2.11 Modem for Additive White Gaussian 

Noise PSK Modulated Memoryless Channel 


We use the notation that x.. is the kth code symbol for the jth 
branch. Each binary symbol (which we take here for convenience 
to be ±1) modulates the carrier by ± II/2 radians for T seconds. 
The transmission rate is, therefore, 1/T symbols /second or b/nT= 
R/T bits/second, e is the energy transmitted for each symbol. 

5 

The energy per bit is, therefore, e b =* e g /R. The white Gaussian 
noise is a zero mean random process of one-sided spectral density 
N 0 watts/Hz, which affects each symbol independently. It is 
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readily shown that the channel output symbol y., is a Gaussian 
random variable whose mean is /z~ x., (i.e., + /e~ if x., =+l 

S 3K S 

and -/eTJ if x^ k = -l) and whose variance is N 0 /2. Thus the condi- 
tional probability density (or likelihood) function of y., given 

I* 


x is 


p(y jk |x jk> = exp [ - (y ik - ^ x ik )2/N ° 1 

/rmr 


The likelihood function for the jth branch of a particular code 

path is x . ^ 

~3 


(m> ) “J^^jkl^k 

since each symbol is affected independently by the white Gaussian 
noise and thus the log-likelihood function for the jth branch is 
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n 


■ £ 


k=l 


^jk G s x jk 


(m) 2 - *s£nl[/N t 


n 



N fl k=l 


y x 
r jk ]k 


(m) 


n 



N q k-1 



? *nn/N 0 


- 


y x 
}k ]k 


(m) 


D 


where C and D are independent of m, where we have used the fact 
( rn ) 2 

that x jk = Similarly, the log-likelihood function 

for any path is the sum of the log-likelihood functions for each 
of its branches. 

We have thus shown that the maximum likelihood decoder for 
the memoryless AWGN biphase modulated channel is one which forms 
the inner product between the received (real number) sequence 
and the code sequence (consisting of ± l’s) and chooses the path 
corresponding to the greatest. 
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Thus the metric for this channel is the inner product as contrast- 


ed with the distance* metric used for the BSC. 

For convolutional codes the structure of the code paths was 
described in Sections 2.1.1 - 2.14. In Section 2.1.2 the optimum 
decoder was derived for the BSC. It now becomes clear that if we 
substitute the inner product metric ^y ^ k x^ k ^ for the distance 
metric ^ cL v ^ , used for the BSC, all the arguments used in 
Section 2.1.2 for the latter apply equally to this Gaussian Channel. 
In particular, the Viterbi decoder has a block diagram represented 
by the code state-diagram. At step j the stored metric for each 
state (which is the maximum of the metrics of all the paths lead- 
ing to this state at this time) are augmented by the branch metrics 

for branches emanating from this state. The comparisons are per- 

t) 

formed among all pairs of (or in general sets of 2°) branches 
entering each state and the maxima are selected as the new most 
likely paths. The history (input data) of each new survivor must 
again be stored and the decoder is now ready for step j+1. 


2,1.7 Metric Quantization, Path Memory Truncation, and Other 
System Considerations . As we have just shown, the optimum metric 
for the biphase modulated AWGN channel is the inner product (or 
correlation) metric. However, since the y„ are real numbers, a 
practical digital implementation requires quantization prior to 


* Actually it is easily shown that maximizing an inner product is 
equivalent to minimizing the Euclidean distance between the cor- 
responding vectors. 



forming the metric. 

In particular, if we quantize the y„ to Q levels symmetric 
about zero, then the biphase AWGN channel is converted to a binary- 
input Q-output symmetric channel. Generally we choose Q=2^ so 
that each received symbol can be represented by a "q" bit word. 

The optimum metric in this case is the log-likelihood function of 
this new binary-input Q-output channel. However, it has been 
found by simulation that nearly equivalent performance is obtained 
if the inner-product metric ia used with v^ . replaced by Q (y ^ ) , 
where Q(y^.) is an integer between 0 and Q-l corresponding to the 
quantizer output level for an input y^. l n fact, extensive 
simulation has shown that using this metric with 8-level quanti- 
zation causes a performance degradation which is equivalent to a 
reduction of E b /N Q by less than 1/4 db for any given error 
probability level. On the other hand, quantization to 2 levels 
(which amounts to reducing the AWGN to a BSC) causes an effect- 
ive reduction of E^/N^ by approximately 2 db. 

Another major problem which arises in the implementation of 
a Viterbi decoder is the length of the path history which must 
be stored. In our previous discussion we ignored this important 
point and therefore implicitly assumed that all past data would 
be stored. A final decision can be made by forcing the coder 
into a known (all zeros) state, but this is totally impractical 
for long data sequences, for it requires storage of the entire 
trellis memory for each state. Suppose we truncate the path 
memories after L bits (branches) have been accumulated, by 



K 

comparing all 2 metrics for a maximum and deciding on the bit 

K 

corresponding to that path (out of 2 ) with the highest metric 
L branches forward. If L is several times as large as K, the 
additional bit errors introduced in this way are very few. It 
can be shown that the additional error probability due to path 
truncation, based on the largest path metric L branches beyond 
where the decision is to be made, is of the order of a block 
coding error for a code of block length l bits. Both theory and 
simulation then indicate that by making L four to five times as 
large as the code constraint length K, we can ensure that such 
additional errors have only a slight effect on the overall bit 
error probability. 

Of course, basing the decision upon the maximum metric L 

branches forward may require a costly implementation to compare 
K 

all 2 state metrics. Other decision techniques, based on ma- 
jority polling and metric overflow monitoring, are much less 
costly and appear to yield the same or better performance when 
L is increased slightly. 

Cost and complexity of implementation of a Viterbi decoder 

depends strongly on constraint length, K, quantization, and 

speed. It depends much less strongly on path memory size, L, 

and the path truncation decision technique. In particular, the 

cost rises exponentially with K, but of course, the performance 

also improves with increasing K. Typically for a rate 1/2 code 

on an 8-level quantized AWGN, the required E, /N for P_ z 10" 5 

Do *-* 

is reduced by about 0.4 db per unit increase of K in the range 
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between 3 and 8. The cost of increasing L is only linear, but it 
is not justified on the basis of performance beyond L z 5K. The 
cost of finer quantization depends strongly on the data speed 
requirements. The performance improvement from 2 level to 8 level 
quantization is nearly 2 db in E^/Ng but there is less than 0.25 
db to be gained by using more than 8 levels. 

For sufficiently low data speeds, all of the metric calcula- 
tions and comparisons can be done serially, thus significantly 
reducing cost and complexity. At very high speeds, where digital 
qate speeds are only a few times faster than the received symbol 
rates, all metric computations and comparisons must be made in 
parallel. In intermediate speed regions, serial-parallel combina- 
tions may be possible. 

Detailed consideration of Viterbi decoder implementation and 
system designs will be treated in Section 2.4. 

2 . 2 Rate 1/2 Convolutional Codes and Viterbi Decoders . In 
this section, the performance of rate 1/2 Viterbi decoders is 
examined in detail. Bit error rate vs. E^/Ng obtained both by 
simulations and analysis are presented for optimum codes of con- 
straint lengths 3 through 8. Particular attention is paid to the 
sensitivity of performance to the decoder parameters which in- 
fluence complexity and cost. Also of interest is the ability of 
a Viterbi decoder to withstand demodulator imperfections, and its 
usefulness in communicating system quality information. 

Computer simulation of Viterbi decoders is a useful technique 
for evaluating performance down to a bit error rate of about 10” 4 
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to 10“ 5 , depending on code constraint length. Simulations at 
lower error rates require prohibitively long computer runs to 
obtain meaningful data. Fortunately, an upper bound on both 
event and bit error rates has been derived which is very tight 
for error rates of about 10“ 5 and lower. A combination of the 
simulations and the numerically evaluated upper bound presented 
here provides a complete picture of Viterbi decoder performance 
over a wide range of error rates. 

v / 2.2.1 Good Convolutional Codes . One obvious criterion for 
selecting codes is bit error probability. Unfortunately, obtain- 
ing bit error probability through simulation is too time consuming 
to be used as a method of sifting through a large number of con- 
volutional codes. A much more useful measure of a code is its 
minimum free distance. As defined previously, the free distance 
between two code words is the Hamming distance between them from 
the state in the trellis at which they diverge (the point at 
which the information bits begin to differ) , to. the state where 
they remerge (after K-l identical information bits). A set of 
large free distances between the correct code path and the 
competing incorrect oaths is desirable with Viterbi decoding. 

This is because the greater the free distance, the more channel 
errors must occur in order for an incorrect path to look more 
likely than the correct path. 

The minimum free distance , d^ f is the smallest value of 
free distance between the correct path and any other path* Since 
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the codes under consideration are linear codes, the set of 
distances from any codeword to all other codewords is the same 
as the set of distances from the all zeros codeword to all other 
codewords. Thus, is the minimum of the weight of all code- 
words from the point at which they diverge from, until the point 
at which they remerge to, the all zeros path. Often, but not 
always, the minimum weight path corresponds to an information 
sequence with a single 1 in it. The codeword associated with 
this sequence diverges from the all zeros path where the infor- 
mation 1 occurs, and remerges K-l branches later. This, of 
course, is the shortest length over v?hich two distinct paths can 
be diverged. 

Using the alqebraic properties of linear group, codes, an 
upper bound on the minimum free distance of a convolutional code, 
as a function of constraint length, has been found (Ref. 1,2)'. 
For rate 1/n nonsystematic codes, the bound is 



<_ min 
h 



( (K+h-1 ) n ] 


This bound provides a target value of d f which can be used when 
searching for good codes. If a code is found with a d^ which 
satisfies the bound with equality, it is immediately known that no 
code exists with a larger minimum distance. Of course, maximiz- 
ing minimum free distance does not necessarily minimize decoder 
error probability. The number of codewords having the minimum 
distance, as well as the distribution of codewords at distances 




somewhat greater than d^, are also important. After preselecting 
codes based on minimum free distance, these other factors are 
useful in final code selection. Simulations and numerical code 
evaluation indicates that choosing codes with maximum minimum 
free distance, taking into account the number of paths at this 
distance, and if necessary, slightly larger distances yields 
codes with minimum error probabilities with Viterbi decoding. 

The optimum rate 1/2 codes for K=3 through 8 were found by 
Odenwalder (Ref. 3). They are tabulated in Table 2.2.1. For 
each constraint length the table shows the optimum code gener- 
ators, the actual d, for the code, the number of errors n in 

f e 

all of the codewords at the minimum distance, and the upper bound 
value on minimum free distance d^*. 

2.2.2 Numerical Code Performance Bound . One of the two 
principal tools used in evaluating the performance of convolu- 
tional codes in this study has been an upper bound on error 
probability related to the convolutional code transfer function. 
The bound is extremely tight for high E^/N^ (low decoder error 
rates) where computer simulation is impractical, due to the 
prohibitively long times required to collect significant data. 

It has been shown (Refs 5,6) that a union bound on the per- 
formance of a convolutional code on memoryless channels can be 
obtained from the directed-graph state diagram of the coder. 

For example, the optimum constraint length K=3, rate 1/2 coder 
is shown in Figure 2.2.1. The states correspond to the con- 
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Code 

Generators 

d f 

n e 

V 

3 

111 

101 

5 

1 

5 


1111 

1101 


11101 

10011 


111011 

110001 


1111001 

1011011 


11111001 

10100111 




s 


Table 2.2.1 Optimum Rate 1/2 Codes. d f is 
the code minimum free distance, n Is the 
number of bit errors in paths at distance 



* is the upper bound on minimum free 
ce. 
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Fig. 2.2.1 Code and State Diagrams for K«3 Code 
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tents of all but the first stage of the coder register, when a 
new information bit has just entered the first stage. The 
exponent of D(0, 1, or 2) is the weight of the (two symbol) 
vector output at this time, and the exponent of N(0 or 1) indi- 
cates whether a 0 or 1 information bit has just entered the coder. 

Regarding the all zeros node as both the input and output of 
the graph, the transfer function of any path through the tree is 
defined as the product of the branch transfer functions along 
that path. For example, the transfer function of the path 
corresponding to the information sequence 10100 is 

T ioioo " (ND Z ) (D) (N) (D) (D 2 ) =N Z D S (2.2.1) 

The transfer function of the graph is the sum of the transfer 
function of all paths starting and ending in the all zeros state. 
The general form of this transfer function is 

d_ d^+1 d f +i 

T (N, D) = D r f (N) + D 1 f 2 (N) + ... + D f i+1 (N) 

‘ ' ( 2 . 2 . 2 ) 

Here d^ is the minimum free distance of the code. Notice that 
the exponent in the path transfer function (Eq. 2.2.1) is the 
weight of the code symbols on the particular path through the 
graph of Fig. 2.2.1. Therefore, with N=l, the terms in the 
transfer function T(1,D) are of the form 
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d +i 

D f i+l ^ 

where f^ + ^(l) is just the number of paths at distance d^+i. 

For the unquantized, additive white Gaussian noise channel 
with PSK modulation, the error probability between the all zeros 
(correct) path, and another path which diverges from and returns 
to the all zeros path, is bounded by (Ref. 4) 


p 2 < e - dE = /N » 


where d is the weight of the competing path. E is the code 

s 

symbol energy and N fl is the noise spectral density. For example, 
if the competing path corresponded to the information sequence 
10100, the bound is obtained from Eq. (2.2.1) 


= -6E /N 0 

N=l, D=exp(-E g /N o ) 

Likewise, a union bound on first event error probability due to 
all paths competing with the all zeros path (all paths through 
the graph in Fig. 2.2.1) is 

N-l, D=exp (-E g /N 0 ) (2.2.3) 

In order to get a bound on bit error probability, we note 
that the exponent of N in a path transfer function is the number 


P E < T (N, D) 


< T 


10 10 0 
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of information l's (errors) on that path. A union bound on bit 
error probability would be obtained if the path transfer function 
were weighted by the number of bit errors on the path. One 
simple way of doing this is to take the derivative of T(N,D) with 
respect to N. This brings down the exponents of N — the number 
of bit errors on a path — into the coefficients. The bound on 
bit error probability is therefore 


P„ < dT(N,D) 

D 


dN 


N=l, D=exp(-E g /N o ) 


(2.2.4) 


For the Gaussian channel these bounds can be tightened some- 
what (Ref. 5) : 


P E < erfc (/a £ E s /N„ ) D ' d f T(N(D) 


N=1 


(2.2.5) 


P B < erfc ( /d -E /N ) D f ~ 

£ s' 0 dN 


N=1 (2.2.6) 


with D = exp(-E /N ) in both cases. 

The difficulty of this approach is that the number of states 
grows exponentially with K and consequently the tedium involved 
in direct computation is effectively insurmountable for K>4. 

On the other hand, the calculation of the transfer function 
is equivalent to a matrix inversion. Taking into account the 
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particular properties of a convolutional code transfer matrix, 

the transfer function can be evaluated numerically using an 

iterative technique. This technique is explained in detail in 

Appendix A. A computer program has been written to evaluate the 

transfer function bound as a function of K, code rate, and E, /N . 

'bo 

For rate 1/2 codes the performance bounds are presented in 
Section 2.2.4, for other rates, they are contained in Section 2.3. 

Decoder performance predicted by the bounds at around 10“ 5 
bit error rate is quite close to simulation results, allowing for 
finite receiver quantization in the simulations. 

2.2.3 Viterbi Decoder Simulation Program . A program has 
been written to simulate the operation of a Viterbi decoder on a 
quantized Gaussian channel. The program is quite flexible, in 
that all of the parameters of interest in Viterbi decoding can be 
varied by changing program inputs. For rate 1/2 codes, the con- 
straint length can be varied from 3 to 9. Simulated received 
data for PSK modulation on an additive white Gaussian noise 
channel can be generated for any value of E, /N 0 with output 
quantization of 2, 4 and 8 levels. For each run the program 
provides : 

a) Bit error rate for a variety of decoder path lengths, 
with output selection based on the most likely state. 

b) Event error rate. 

c) Average length in bits of an error event, from the 
first bit error to the last comprising the event. 

In addition to the statistics in a) , b) and c) , which are 
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based on maximum likelihood state output selection, the follow- 
ing measured error rates are provided: 

d) Bit error rate with majority output selection. Here 
the output on a majority of the decoder paths is 
chosen as the decoder output. 

e) Bit error rate resulting from selecting an output from 
some state path whose metric value is better than some 
threshold value. 

All of the gathered statistics except those in a) are for 
length 32 decoder state paths. The output decision technique 
simulated in d) and especially e) , although they are slightly 
sub-optimal, are much simpler to implement in parallel process- 
ing Viterbi decoders than maximum likelihood selection. 

Lastly, with an eye toward the 180° phase ambiguity problem 
with PSK modulation, the simulation program measures 

f) Bit error rate when differential data encoding-decoding 
is used with codes transparent to 180° phase flips. This 
technique, along with the simulation results, is treated 
in Section 2.2.5. 

2.2.4 Simulation and Numerical Performance Data 

2.2. 4.1 General Performance Results . The principal results 
of the simulations and code transfer function bounds are shown in 
Figs. 2. 2. 4.1, 2. 2. 4. 2, and 2. 2. 4. 3. All of these figures show 
bit error rate vs. E, /N for Viterbi decoders using the optimum 
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Fig. 2. 2. 4. 2 Bit Error Rate vs. 
E /N 0 for Rate 1/2 Viterbi 
Decoding. 8-Level Quantized 
Simulations with 32 Bit 
Paths, and the Infinitely 
Finely Quantized Transfer 
Function Bound, K=4,6,8. 
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rate 1/2 convolutional codes of Table 2.2.1. In all cases, the 
decoder state path length was 32 bits. In all simulation runs, 
at least 25 error events contributed to the compiled statistics. 

The simulation results in Figs. 2. 2. 4.1 and 2. 2. 4. 2 are for 
soft (8-level) receiver quantization. Equally spaced demodula- 
tion thresholds are used at ±1.5a, ±a, ±0.5 , and 0 where a 2 - Nq/ 2 
is the noise variance. This choice of 8-level quantizer thres- 
holds is within a broad range of near optimum values, as will be 
shown presently. The transfer function bound is for infinitely 
finely quantized received data. Allowing for the 0.20 to 0.25 
db loss usually associated with 8-level receiver quantization 
compared with infinite quantization, the transfer function bound 
curves are in excellent agreement with simulation results in 
the 10 - * 4 to 10 -5 bit error rate range. 

Since the accuracy of the transfer function bound increases 
with E b /N 0 , decoder performance can be ascertained accurately in 
the 10- 5 to 10 -8 region even in the absence of simulations. 

Ideally, the symbol metrics associated with each of the 8 
quantization levels would be proportional to the log-likelihood 
of receiving the given level, given the hypothesis of a "0" or 
a "1" transmitted. In the interest of keeping the number of 
bits required to represent metrics to a minimum, it was shown 
(Ref. 2) that equally spaced symbol metrics, for instance, the 
numbers 0-7, could be used with negligible performance degrada- 
tion. Vie have taken the compression of metric representation one 


- 40 - 



step further. As is shown in section 2.4, an additional bit 
in the state metric can be saved if levels symmetrically located 
about the zero threshold have symbol metrics which are the nega- 
tives of one another. Thus, for the simulations presented in 
Figs. 2. 2. 4.1 and 2. 2. 4. 2, the eight symbol metrics uped were 
4, 3, 2, 1, -1, -2, -3, -4. These symbol metrics clearly do not 
change in equal increments; however, simulations have shown that 
system performance does not suffer significantly. 

Fig. 2. 2. 4. 3 gives the simulation results for yiterbi decoding 
with hard receiver quantization. The same optimum rate 1/2, 

K=3 through 8 codes were used here as in the 8-level quantized 
simulations . 

Several points are obvious from the performance curves 

a) 2-level quantization is everywhere close to 2 db 
inferior to 8-level quantization. This seems to rein- 
force the folk theorem that hard quantization always 
leads to a 2 db loss in system efficiency. 

b) Each increment in K provides an improvement in efficiency 
of something less than . 5 db at a bit error rate of 10“ 5 . 

c) Performance improvement vs. K increases with decreasing 
bit error rate. 

2. 2. 4. 2 Receiver Quantization . In order to observe the 
effects of varying receiver quantization more closely, simulation 
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performance data is presented in Fig. 2. 2. 4. 4 for the K=5, rate 
1/2 code, with 2, 4, and 8-level receiver quantization. The 
8-level thresholds and metrics are identical to those of Fig. 

2. 2. 4.1. In fact, the 2 and 8 quantization level curves are 
taken from Figs. 2. 2. 4. 3 and 2. 2. 4.1 respectively. The 4-level 
thresholds were set at 0 and ± a. The metrics were chosen to be 
2, 1, -1, -2, for the same reasons which suggested the 8-level 
metrics . 

2. 2. 4. 3 Path Memory. The Viterbi decoder is a maximum 
likelihood decoder only when its decision path memories are 
infinitely long. That is, decoding delay is infinite. For 
practical purposes, it is desirable to use path memories as short 
as possible. There is a path memory for each state in a Viterbi 
decoder. Providing storage and managing decision paths is a 
significant part of any Viterbi decoder. It is therefore worth- 
while to study the performance degradation vs. path length for 
Viterbi decoding. 

Fig. 2. 2. 4. 5 shows bit error rate performance vs. E b /N fl for 
three path lengths (8, 16 and 32) using the rate 1/2, K=5 code, 
for both 2 and 8-level received data quantization. The length 
32 path curve is identical to the K=5 curve in Fig. 2. 2. 4.1. 
Performance with length 32 paths is essentially identical to that 
of an infinite path decoder. Even for a path length of only 16, 
there is only a small degradation in performance. Other simula- 
tions have shown that a path length of 4 to 5 constraint lengths 


- 42 - 



Bit Error Probability 






Bit Error Probability 



3 


4 


6 


7 


8 






is sufficient for other constraint lengths as well. 

2. 2. 4. 4 Decoder Output Selection . In a Viterbi decoder with 
finite path memories, it is possible that not all state paths are 
merged at the point at which a decoded bit must be output. Phy- 
sically this means that the oldest bits in each of the state path 
memories may not always agree. The decoder must output a bit 

K — 1 

however and there must be a means for selecting which of the 2 
oldest path bits to output. 

The optimum method for selecting output bits is to choose the 
bit corresponding to the path with the best metric. This selection 
rule is very complex to mechanize in a high speed decoder, where 
the pairwise state comparisons are done in parallel. This fact 
has lead to a study of simpler output selection schemes, the 
aim being to find one which does not degrade performance appre- 
ciably. One very simple scheme is to choose a path at random from 
which to output decoded bits (or always output bits from the same 
path). This scheme, however, has been found to significantly de- 
grade performance. In fact, if a path memory of n bits is required 
for a given performance goal with maximum likelihood output selec- 
tion, then simulation has shown that a memory of up to 2n bits is 
required for the same performance with arbitrary output selection. 

Another method is to output a "0" if a majority of the 2 
paths have a "0" as their oldest bit; otherwise, output a "1". 

This scheme is somewhat simpler to implement than maximum likeli- 
hood selection. 
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An efficient yet simple to implement scheme, which we have 
devised, is to select the output from some state path whose mct- 
tic is better than a certain threshold. This scheme is described 
in Section 2.4. It is called "less than four" selection. Fig. 

2. 2. 4. 6 compares the performance of a) maximum likelihood selec- 
tion, b) majority selection, and c) "less than four" selection. 

The comparison is made for a K=7, rate 1/2 code with 3-level 
quantization, and path lenqth 32. It is interesting to note that 
the performance of the K=5 decoder was the same for all three out- 
put selection schemes, with a 32 bit path memory. As the path 
memory gets long relative to K, there is a larger probability that 
all state paths will be merged by the time a bit must be output. 
Thus the output selection mechanism has less of an effect on 
performance. 

2.2.5 Code Synchronization and Channel Reliability . 

2 . 2 . 5 . 1 Node Synchronization and Phase Ambiguity Resolution . 
Because of the inherent continuity involved in convolutional cod- 
ing, code synchronization at the receiver is usually much simpler 
than in the case of block codes. For convolutional decoding 
techniques involving a fixed number of computations per bit de- 
coded, such as Viterbi decoding, the decoder initially makes an 
arbitrary guess of the encoder state to start decodinq. If the 
quess is incorrect, the decoder will output several bits pr, at 
most, tens of bits of unreliable data before assuming steady 
state reliable operation. Thus, the block synchronization pro- 
blem does not really exist. There remains the problem of node 
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synchronization and, depending upon the modulation-demodu- 
lation technique used, the problem of 2 or 4 phase ambiguity 
resolution. For a rate 1/n code, there are n code symbols on 
each branch in the code tree. Node synchronization is obtained 
when the decoder has knowledge of which sets of n symbols in 
the received symbol stream belong to the same branch. In a 
purely serial received stream, this is a 1 in n ambiguity. 

In addition, modems using biphase or quadriphase PSK with 
suppressed carriers derive a phase reference for coherent de- 
modulation from a squaring or fourth power phase lock loop or 
its equivalent. This introduces ambiguities in that the squaring 
loop is stable in the in-phase and 180° out of phase positions, 
and the 4th power loop is, in addition, stable at ±90° from the 
in-phase position. 

We have directed our efforts toward using the error detec- 
tion capability of convolutional decoders, or the ability of the 
decoder to detect unsatisfactory system of operation, to detect 
and correct for incorrect node synchronization and the occasional 
phase flips in the phase tracking loop. It is now apparent that 
simple and effective techniques for maintaining node and phase 
synchronization completely within the decoder itself are feasible. 

For the purpose of ease of illustration, the rate 1/2, hard- 
quantized receiver output case will be considered here. Techniques 
generalize easily to soft quantization and with somewhat more 
complexity to other rates. A Viterbi decoder operating on hard 
quantized received data will use Hamming distance for state metrics. 
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Relatively large metric values indicate a poorer match to received 
data than lower metric values. The smallest state metric at any 
time corresponds to the path with the best match to received data, 
and the metric itself is equal to the number of discrepancies be- 
tween the received data and that path. Clearly, when the decoder 
is in correct node synchronization and the demodulator loop is 
locked properly, the path with the smallest Hamming distance will 
usually correspond to the correct path. The rate of increase of 
this path metric will depend on the channel error rate. For in- 
stance, if the crossover probability is p=.02 then, on the average, 
there will be an increase of 1 in the correct path metric for 
every 50 channel symbols. On the other hand, we intuitively ex- 
pect that if node synchronization is lost, or if the phase lock 
loop locks onto an incorrect phase position, the match between the 
received data at the nearest codeword should be much poorer than 
1 mismatch in 50 symbols. If, in fact, the best path metric in- 
creases more rapidly when off node or phase synchronization, this 
can be used to detect these maladies. 

One simple technique would use an up-down counter to detect 
unreliable system operation. The counter counts up by k units 
(k > 1 ) each time the best path metric increases by 1, while it 
counts down by 1 for each bit time. The parameter k is chosen 
so that the average drift of the counter is downward when in 
proper node and phase synchronization and upward when out of 
either phase or node synchronization or both. The first condi- 
tion requires that kp<l where p is the channel crossover 
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probability; while, the second condition requires kp'>l where 
p' is the as yet undetermined rate of increase of the best path 
metric with improper node or phase synchronization. The count 
would not be allowed to fall below zero. When the count exceeds 
a threshold value N, the assumed node synchronization or phase 
synchronization position is changed in the decoder according 
to a preset strategy. 

Potentially effective methods of changing phase and node 
synchronization depend upon whether the system uses BPSK or QPSK. 
For BPSK, both the problem of node synchronization and 180° 
phase ambiguity exist. The 180° phase ambiguity can be cir- 
cumvented by using differential encoding of the information prior 
to convolutional encoding and differential decoding after the 
channel decoder. A transparent convolutional code is required, 
i.e., the all l's data sequence maps into the all l's code 
sequence. This technique is discussed in section 2. 2. 5. 2. 

Another method of homing in on both node and phase 
synchronization is shown in Fig. 2. 2. 5.1. 



change phase 
sync 180° 


Figure 2. 2. 5.1 Biphase node-phase synchronization 

strategy. 
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State 1 represents an arbitrary initial position for phase 
and node synchronization. If the system reliability counter 
counts past N, a node synchronization change is attempted, 
bringing the system to state 2. If the count again passes the 
threshold, received symbols are complemented prior to decoding 
(equivalent to a 180° phase shift) bringing the system to state 3. 
Subsequent counter overflows cause the system to go to state 4 
and then back to 1. There are only 4 possible states because 
there are 2 stable phase positions and, for rate 1/2, 2 node 
synchronization positions. 

For a quadriphase modem and rate 1/2 , there is no node 
synchronization problem since both parity bits on a branch are 
transmitted on one baud. However, now there are 4 stable phase 
positions; thus, there are 4 stable system states to contend 
with just as in the biphase case. These states, as well as the 
phase synchronization strategy are shown in Figure 2. 2. 5. 2. 


90° phase change 



90° phase change 


Figure 2. 2. 5. 2 Quadriphase phase synchronization 

strategy. 
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The problem still remains as to whether the best path metric 
will increase rapidly enough, when phase or node synchronization 
is incorrect, to reliably detect these events while keeping the 
false alarm rate sufficiently low. Understanding of the operation 
of the decoder when synchronization is lost is aided by observing 
the effect of synchronization loss on a hypothetical code 
syndrome calculated at the receiver. 

These effects can best be seen by working with the poly- 
nomial representations of the information, parity, and code 
generator sequences. The polynomial coefficients are the terms 
in the sequence of interest. For instance, the information 
sequence 101101... is represented by the polynomial 

i (D) = 1+ D 2 + D 3 + D s +. . . 

The parity stream generated by passing i (D) through a convolutional 
encoder with generator g(D) is simply 

p(D) = i(D)g(D) 

A rate 1/2 code has two generators g (D) and g (D) which generate 

1 2 

two streams p (D) and p (D) . A representation of the encoder, 

1 2 

channel and syndrome calculator is shown in Figure 2.2.5. 3. The 

n i (D) are channel noise polynomials, a coefficient of 1 represents 

a channel error. All additions are mod-2. 

Notice that in the absence of noise, i.e., n (D) = n (D) = 0, 

1 2 

s(D) « i(D)g (D)g (D) + i(D)g (D)g (D) 

12 2 1 

= 0 

independent of i(D); thus, the syndrome s (D) depends only upon the 
noise polynomials and hence is in fact a true code syndrome. 
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Figure 2.2.5. 3 Representation of encoder, channel and 
syndrome calculator. 


A hypothetical decoder operating on a segment of the syn- 
drome polynomial should attempt to find the minimum weight 
channel error polynomials which could have caused the particular 
syndrome pattern. Decoding would then reduce to changing those 
bits corresponding to ones in the minimum weight error pattern. 

We shall now look at the functional form of the syndrome in the 
presence of various combinations of improper phase and node 
synchronization. 

Considering the biphase modulation case first, suppose 
phase synchronization is correct and node synchronization is off. 
This is equivalent to the received lines being crossed in Fig. 

2. 2. 5. 3. Thus: 

r j (D) = i (D) g 2 (D) 
r 2 (D) = i (D) g i (D) 

s(D) = i (D) [g* (D) + g 2 (D) J , 
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in the absence of noise. Unlike the in-synchronization case, 
the syndrome now depends on the information sequence. In fact, 
the syndrome is the information sequence convolved with a new 
generator g* (D) + g* (D) . If the coefficients of i (D) are EL 
(that is independent and equally likely to be 0 or 1) then the 
coefficients of s(D) also have this property (Ref. 4). 

Now suppose node synchronization is correct and the phase 
flipped 180°. Here 

r i (D) = i (D) g i (D) 
r 2 (D) - i (D) g 2 (D) 

s(D) = i(D)gi (D) g 2 (D) + i (D) g 2 (D) g t (D) 
in the absence of noise. If the code were transparent to 180° 
phase flips, both g } (D) and g 2 (D) would have to be odd, that is, 
have an odd number of non-zero terms (coder taps). Since we do 
not want the transparent feature in this analysis, we will 
assume g i (D) is odd and g 2 (D) is even (if they were both even the 
code would be catastrophic). Now a sequence convolved with an 
even generator is the same as the result of convolving the 
sequence with the same generator; whereas, a sequence convolved 
with an odd generator is the complement of the result of convolv- 
ing the complemented sequence with that generator. Thus, 
s(D) « i (D) g t (D) g z (D) + i (D)g 2 (D^ (D) 

= 1 + D+ D 2 + D 3 +... 

Hence without noise, a 180° phase flip causes an all l's syndrome. 

When both improper node and phase synchronization occur, it 
is easily shown that without noise 

s(D) = i (D) |g*(D) + g 2 (D)j 
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Again this syndrome is EL if i (D) is EL. 

In the quadriphase situation, the tracking loop can be ±90° 
or 180° out of proper phase alignment. When a 90° shift occurs, 
the in-phase and quadrature channel outputs are switched with one 
becoming complemented. For instance with a 90° shift, 

r, (D) = i (D) g (D) 
r 2 (d) = TT5T57W 

Again assuming q^ (D) is odd and g^(D) is even 

s (D) = i (D) [gj (D) + g* (D) ] 

This is the same syndrome as obtained with improper node 
synchronization. A -90° phase shift yields the syndrome 

s(D) = i ( D ) [g* (D) + g | ( D) ] 

This is the same syndrome as that derived for the biphase 
improper node synchronization, and 180° phase shift case. 

Finally the 180° phase shift case is identical to that for the 
corresponding biphase situation. 

Evidently, the quadriphase and biphase situations are quite 
similar. Quadriphase modulation entails no more ambiguities than 
biphase because the former provides node synchronization for free. 
Both techniques yield an all l's no noise syndrome when locked 
180° out of phase. Quadriphase modulation yields an EL syndrome 
when ±90° out of phase, while biphase does the same when out of 
node synchronization for both stable phase situations. The fact 
that the syndrome is EL depends, of course, on i (D) being EL. It 
can be readily seen from the off synchronization syndrome equa- 
tions that certain information sequences will yield an all zero's 



syndrome even when node or phase synchronization is incorrect. 

The all zeros and all ones information sequences are particularly 
bothersome in this regard. 

An EL syndrome will be obtained if node synchronization is 
out or the phase has flipped by ±90°; in addition, the syndrome 
will be EL if one or both of the n^(D)'s is EL, Thus an EL 

received stream will cause an EL syndrome. Bounds on the 

minimum distance of codewords (Ref. 7) shows that for the best 
rate 1/2 code, asymptotically the minimum distance increases as 
.lln where n is the code lenth in symbols. Thus the nearest 
codeword to an EL received sequence will tend to differ in about 
1 symbol in every 9. This means when the decoder is out of node 
or phase synchronization, the best metric will on the average 
increase by 1 for each 9 received symbols. Actually, the real 
rate of increase may be somewhat greater than this because the 

Viterbi decoder will use a short constraint length code. The 

.lln asymptotic figure is approached in the limit of longer codes. 
The closest codeword to a random sequence will tend to be further 
away when the constraint length is short. 

As mentioned previously, the factor k and the threshold N 
must be chosen such that the counter used to indicate data quality 
drifts upv/ard when synchronization is incorrect and drifts down- 
ward when synchronization is correct. The actual values chosen 
for these parameters will determine: 

(a) the expected time to first passage over the threshold, 
and hence change of system state, when node or phase 


- 56 - 



synchronization is incorrect. We will call the ex- 
pected time to re-synchronization, E . 

JL S 

(b) the expected first passage time when node and phase 

synchronization is correct. This is the expected time 
to false alarm, E^ a . 

Obviously we want to make E rg as short as possible and E^ a 
as large as possible. Analytical techniques have been success- 
fully used to approximate E rg and E^ a as a function of k, T and p. 
The analysis is based upon recognizing that the input to the 
up-down counter is a random walk with a reflecting boundary at 
zero and an absorbing boundary at N. First passage times are 
computed by solving the appropriate Fokker-Planck equation. This 
work is reported in detail in Appendix B. 

In addition to the approximate analysis, simulations have 
been performed to tie down E rg and E^ a more precisely. The 
count-up rate, k, was chosen to be 8 because that value is nearly 
optimum, and it is a power of 2 and, therefore, easy to implement. 
The rate 1/2, K=*5 code was used with hard receiver quantization. 
Fig. 2. 2. 5. 4 shows E^ a vs. the threshold T. This limited data 
supports the analytical results which state that E fa rises 
exponentially with T. Naturally, for a given T, E^ a is larger 
for smaller channel crossover probabilities p. 

Fig. 2. 2. 5. 5 shows the other parameter E rg as a function of 

T. E also rises with T, but more slowly. In fact it is nearly 
rs 

linear with T, as predicted by theory (Appendix B) . E is not a 

rs 

function of p because it is the expected time to re-sync when the 
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Fig. 2. 2. 5. 4 Average number of bits decoded be- 
tween false loss of sync events vs. sync counter 
threshold — p is the channel crossover prob- 
ability. 
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2. 2. 5. 5 Average number of bits to reco 
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ounter. threshold (hard decision decoder 



decoder is out of sync. When out of sync, the received data has 
the same random statistics regardless of the noise. 

These results show that a value of T can be selected to make 
E fa truely negligible, while maintaining E rs , the resync time, as 
low as several hundred to a thousand bits. The increase in 
system bit error rate due to false loss of sync is 


2E rs /E f. 

This is because, on the average, for E fa bits decoded sync is 
lost once. It takes, on the average, 2E to return to the proper 

J>9 

sync state. 


2. 2. 5. 2 Transparent Codes . As mentioned in the previous 
section, another way to resolve 180° phase ambiguities is to use 
a code which is transparent to 180° phase flips, precode the data 
differentially and use differential decoding. A transparent bode 
has the property that the bit-by-bit complement of a codeword is 
also a codeword. Such a code must have an odd number of taps on 
each of its encoder mod-2 adders. This insures that if a given 
data sequence generates a certain codeword, its complement will 
generate the complementary code word. 

If the received data is complemented due to a 180° phase 
reversal, it will still look like a codeword to the decoder, and 
will likely be decoded into the complement of the correct data 
sequence. Now decoding to the complement of the sequence input 
to the encoder is no problem if the data was precoded differential- 
ly. This means that information is contained in the occurrence or 

non-occurrence of transitions in the encoded output sequence rather 
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than the absolute sequence itself. These transitions occur in 
the same places even if the decoded sequence is complemented. 

The major fault with this scheme is that when an isolated 
bit error occurs in the decoder output, it causes two differen- 
tially decoded errors, since two transitions are changed. At 
first glance, this would seem to indicate a doubling of the 
output bit error rate. In fact, this doubling does not occur 
because errors typically occur in short bursts. Two adjacent bit 
errors, for instance, cause only two differentially decoded bit 
errors. This indicates the possibility of only a small increase 
in bit error rate with differential encoding-decoding. 

Fig. 2. 2. 5. 6 shows bit error rate performance curves for the 
K=*7 , rate 1/2 code with and without differential encloding- 
decoding. The degradation is error probability is least at low 
E b /N 0 . Here decoder error bursts are relatively long (on the 
average one to two constraint lengths), so differential encoding- 
decoding loses very little. At higher E b /N Q , bit error rate 
degradation is slightly larger— but nowhere near a factor of two. 
The worst bit error rate degradation factor is about 1.2 over 
the range shown. Using differential encoding-decoding, the E b / N 0 
required increases by less than 0.1 db. K=7 was selected for 
this example because the optimum code for this constraint length 
is transparent (see Table 2.2.1). 

The use of differential encoding-decoding reduces the 
synchronization problem to 2-state node synchronization with BPSK, 
and 2-state 90° phase ambiguity resolution with QPSK. 
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Bit Error Probability 



- 62 - 


2 . 2 . 5 . 3 Channel Reliability Information . The up-down 
counter used for node and/or phase synchronization also provides 
a means for very sensitive measurement of channel performance. In 
the absence of decoder errors, the counter counts up whenever a 
received symbol with a channel error is processed. Thus, the 
number of times the counter counts up per unit time is directly 
proportional to the channel error rate p. This will be true as 
long as the decoder output error rate is low, which corresponds 
to good system performance. If p becomes large due to a system 
failure or loss of code sync, the count will tend to remain above 
zero. 

One method of monitoring system realiability is to sum the 
count over a number of bits which is large compared with 1/p, 
where p is the lowest channel error rate to be monitored. The 
integrator output will be stable and proportional to p when p is 
in the range corresponding to decoder error rates lower than 
about 10 2 . As p rises beyond this point, the integrator output 
will rise monotonically , but not proportionally. When p becomes 
greater than about. 11, the integrator output will saturate. This 
is because the number of decoder corrections (metric increases) 
for a rate 1/2 code never exceeds .11 on the average as was dis- 
cussed in the previous section. 

Integration of the sync counter value over a fixed time 
window therefore provides a sensitive measure of p when the decoder 
is putting out useful data. It also is a good indicator of system 
failure. 
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2.2.6 Sensitivity to AGC Inaccuracy . Coded systems which 
make use of receiver outputs quantized to more than two levels 
require an analog-to-digital converter at the modem matched filter 
output, with thresholds that depend on the noise variance. For 
instance, all of the 8-level quantized Viterbi decoder simulations 
reported on thus far have used level thresholds at 0, ±0.5o, ±a, 
and ±1.5a. 

Since the level settings are effectively controlled by the 
automatic gain control (AGC) circuitry in the modem, it is of 
interest to investigate the sensitivity of decoder performance to 
an inaccurate or drifting AGC signal. Fig. 2.2.6 shows the decoder 
performance variation as a function of A-D converter level thres- 
hold spacing (in all cases the thresholds are uniformly spaced) . 
These simulations used the K=5 rate 1/2 code, with E^/Nq = 3.5 db. 
It is evident that Viterbi decoding performance is quite insensi- 
tive to wide variations in AGC gain. In fact, performance is 
essentially constant over a range of spacings from 0.5a to 0.7a. 
This allows for a variation in AGC gain of better than 20% with no 
significant performance degradation. 

2.3 Other Code Rates . 

The preceding sections have concentrated on Viterbi decoding 
of rate 1/2 convolutional codes. Most of the results on perform- 
ance fluctuation due to decoder parameter variation carry over 
qualitatively, if not quantatively , to other code rates. 

Code rates less than 1/2 will buy improved performance at 
the expense of increased bandwidth expansion and more difficult 
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Bit Error Rate 




symbol tracking due to decreased symbol energy to noise ratios. 
Rates above 1/2 conserve bandwidth but are less efficient in 
energy. 

2.3.1 Description of Code Search Program . Optimum short 

constraint length, rate 1/3 codes have been found previously 

(Ref. 2*3). Our efforts in searching for good codes were confined 

to rate 2/3, K«6 and 8, and rate 3/4, K=6 codes. Since the 

number of possible codes is quite large (there are 2 2H rate 2/3, 

K*8, and rate 3/4, K=»6 codes), a fast method was needed to 

evaluate and select, or reject codes. The method chosen uses 

the convolutional code transfer function described in section 

2.2.2 and Appendix A. Before going into the technique in detail, 

it will be instructive to discuss a result which limits the 

number of codes which must be considered. 

The optimum rate 3/4, K=»6 encoder is shown in Fig. 2. 3.1.1. 

The encoder consists of a K stage shift register, as in the rate 

1/2 case. For a general rate k/n code, however, k binary digits 

are shifted into the coder simultaneously. The coder stages are 

connected to n mod-2 adders. Note that the trellis formed by 

(K-k) k 

a rate k/n code has 2 ' states with 2 branches leaving and 

k 

entering each node. This is because any one of 2 groups of k 

binary digits can enter the coder at once. Decoding involves 

k K— k 

making a 2 -wise decision for each of 2 states per k bits 

decoded. 

It can be shown that no generality is lost if the codes are 
restricted in the following way. Suppose we label the mod-2 adders 
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Fig. 2. 3. 1.1 Optimum Rate 3/4, 
K=6 Code Encoder 
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from 1 through k, and concern ourselves with connections to the 

first k encoder stages. Connect adder 1 to stage 1, adder 2 to 

stage 2, . .., and adder k to stage k. Make no other connections 

between the first k stages and the first k adders. This reduces 

k 2 

the number of codes to be searched by a factor of 2 . The 

encoder in Fig. 2. 3. 1.1 is of the type described. 

Recall that the code transfer function (for N=l) is of the 

form 

T (D) = a ( D d f + a 2 D df + 1 + ... + a i+1 D df + 1 + . . . (2.3.1) 

— 6 

where d^ is the code minimum free distance. If we set D=10 , 

and if the rate of growth of the a^ satisfies certain conditions 
(see Appendix A), then T(10 6 ) will be very close to a.^ 10 6d f. 

The code search program has as an input a target value of d f . It 
evaluates the first few terms of the transfer function, with 
D=10 ^ , for each code, and tests to see if their sum exceeds 
10 ^ ^ d f . If it does, this means that the code must have a 
minimum free distance smaller than d^ (or a l > 10^ which is 
impossibly large). Thus, the code is rejected. On the other 
hand, if the iteratively evaluated transfer function remains below 
10 6 ^ d f ^ taken to a sufficiently large number of terms, the 
code's free distance is at least d^, and the code is printed out. 

This technique often results in the generation of many 
candidate codes (or none of the target d^ is too large) . The 
codes are then tested using the numerical transfer function union 
bound program to approximate the bit error rate performance. 

Final code selections are made on this basis. 
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2.3.2 Good Rate 1/3/ 2/3, and 3/4 Codes . Optimum rate 1/3, 
K=3 through 8, rate 2/3, K=*6 and 8, and rate 3/4, K=6 codes are 
tabulated in Tables 2. 3.2.1, 2. 3.2*2, and 2. 3. 2. 3 respectively. 
Included also is the free distance of each code d^, the number of 
bit errors in paths at the minimum distance n g , and the value of 
an upper bound on minimum free distance d£, analogous to that 
obtained for rate 1/2 codes. 

The rate 1/3 codes were reported previously (Ref. 3) and 
are simply repeated here. The rate 2/3 and 3/4 codes are the 
result of the code search program. 

2.3.3 Simulation and Numerical Performance Data . Figure 
2. 3. 3.1 shows bit error rate vs. E b /N Q performance obtained from 
simulations of Viterbi decoding with optimum rate 1/3, K=4,6, and 
8 codes, and 2 and 8 level quantization. The K=4 and 6 results 
were reported elsewhere (Ref. 2) previously. The K=8 code used 
in Ref. 2 was suboptimal; thus, the curve shown here for the 
optimum code is somewhat better than the previously reported 
result. 

Figures 2. 3. 3.2, 2. 3. 3. 3, and 2. 3. 3. 4 show numerical bound 
and simulation performance results for the rate 2/3 K=6, K«=8 and 
rate 3/4 K=6 codes respectively. Simulation curves are for 2 
and 8 level quantization, while the numerical bound curves are, 
as usual, for infinitely fine receiver quantization. 
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K 

Code 

Generators 

d f 

n e 

* 

d f 






3 

111 

111 

101 

8 

3 

8 

4 

1111 

1101 

1011 

10 

6 

10 

5 

11111 

11011 

10101 

12 

12 

12 

6 

111101 

101011 

loom 

13 

1 

13 

7 

1111001 

1100101 

1011011 

14 

1 

15 

8 

11110111 

11011001 

10010101 

16 

1 

17 


Table 2. 3.2.1 Optimum Rate 1/3 Codes 
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K 

Code 

Generators 

d f 

n 

e 

* 

d f 






6 

101111 

011001 

110010 

5 

5 

6 

8 

10110110 

01111001 

11110111 

7 

86 

8 


Table 2. 3.2.2 Optimum Rate 2/3 Codes. 


K 

Code 

Generators 

d f 

n 

e 

* 

d f 

6 

100001 

010011 

001110 

111101 

4 

• 40 

4 


Table 2. 3. 2. 3 Optimum Rate 3/4 Code. 
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2.3.4 Comparison with Rate 1/2 Code . Comparing the per- 
formance data obtained through simulations of Viterbi decoders 
with rate 1/2 (Figs. 2. 2. 4.1, 2. 2. 4. 2, and 2. 2. 4. 3), and rate 
1/3 codes, it is apparent that the latter offers a 0.3 to 0.5 
db improvement over the former for fixed K, in the range reported. 
This is close to the improvement in efficiency of a channel with 
capacity 1/3 compared with one of capacity 1/2, and is therefore 
expected. 

Performance efficiency of the higher rate codes also is 
predictable when compared with the rate 1/2 codes over the range 
with the simulation data spans. The fairest comparisons are 
probably those between decoders with like number of states. 

Thus, the K=6 rate 2/3 data should be compared with the K=5, 
rate 1/2 data. 

Figure 2.3.4 shows the union bounds on performances for the 
rate 2/3, K=6, and rate 1/2, K=5 codes. Both encoders have 16 
states, d^ = 7 for the rate 1/2 code and 5 for the rate 2/3 
codes. At very high E^/No, the rate 1/2 must be superior. This 
is because asymptotically, at high E^/No , the error probability 
goes as 

P e % exp (d f E g /Ng ) = exp (d f RE b /N Q ) 

This gives the rate 1/2 code an advantage of about 0.2 db in the 
limit. 

In Fig. 2.3.4, the difference between the two curves is about 
0.1 db in the error probability range of 10“ 6 to 10" 9 . This small 
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difference is due to the fact that the rate 2/3 code used is a 
particularly good code (it has a minimum distance coefficient of 
only 5 in the union bound for bit error probability) . Some 
additional discussion on decoder complexity vs. rate is contained 
in section 2.4. 
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2 . 4 Viterbi Decoder Implementation* 


2.4.1 Review of Decoder Algorithm . The present discussion 
is specialized to a decoder for a constraint length K=4 code. 

The design principles are the same for other values of K. Six 
features will be discussed in the following sections relating to: 

1) the assignment of metric to the received data (metric 
compression) 

2) the efficient storage of state metrics (overflow-protection) 

3) the design of the decision-memory and the selection of the 
output bit (maximum likelihood selection) 

4) the time sharing of state metric calculation 

5) choice of logic family (TTL or MECL) 

6) relation of cost to constraint length, code rate, speed, etc. 
To set the stage for this discussion, we first review the 

fundamental operations of the Viterbi decoder. The code under 
discussion is that given by the K=4 convolutional encoder of 
Fig. 2. 4. 1.1. The states of the code at any time correspond 
to the contents of the first K-l=3 stages of the shift register. 

The coder is shown in state Oil. It previously was in state 110; 
an input 0 caused a coder transition to state Oil and a coder 
output of the 2 check digits CjC 2 = 01. 


*Many of the ideas in this section concerning metric compression, 
overflow protection and output selection were first formulated 
in Ref. 10, and are the subject of patent applications either in 
preparation or pending. 
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A 4 state transition diagram for states 110, 111, Oil, and 
111 is given in Fig. 2. 4. 1.2. For any value of K, sets of 4 
states group together with xO and xl on the left and Ox and lx 
on the right, where x is any k-2 bit sequence. For good codes, 
both the first and last encoder shift register stages are con- 
nected to all the mod-2 adders. For these codes, the check 
digits c j c 2 for the group of 4 are always complementary as 
depicted in Fig. 2.4. 1.3. 

A block diagram of a Viterbi decoder is shown in Fig. 2. 4. 1.4. 
The channel and modem cause the conversion of the transmitted 
digits CjC 2 to received digits rjr 2 . From r 1 r 2 , 4 "metrics" 
are calculated by the input section, R Q 0 , R fl t , Rj , Rj ; , cor- 
responding to the 4 possible values of CjC^ On the basis of 
these, the decoder must decide, for all values of x, whether 
state Ox was entered from state xO or state xl and similarly 
whether state lx was entered from state xO or state xl. These 
decisions are made by the ACS (add-compare-select) circuits. 

These binary decisions are denoted by the variables D^ x . For i=0 
or 1, D. =0 indicates a decision in favor of the transition 

IX 

from state xO to state ix, v/hereas D. =1 indicates a decision 

' xx 

in favor of the transition xl to ix. 

The decisions are made on the basis of metrics associated 

with each state. Let M . denote the metric associated with 

xj 

state x j , j=0, 1. Then the decoder makes decisions — see 
Figure 2. 4. 1.3 — as follows (the convention that small metrics 
are "good" is used) : 
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1 


Input Bit 


Fig. 


Fig. 


Fig. 



Note : (+J denotes modulo-2 addition 


.4.1.1 K=4 Convolutional Encoder, Rate = % 



Oil 


111 


Initial Check Final 

State Digits State 


.4.1.2 4-State Transition Diagram for a One 
Input Step for Coder of Fig. 2.4.1 


(0 input causes transition to upper state; 
1 input causes transition to lower state) 



.4.1.3 General 4-State 1 Step Transition Diagram 
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Fig. 2. 4. 1.4 
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VITERBI DECODER 
BLOCK DIAGRAM 







M.+R < M , + R— - 

XO C i C 2 — xl CiC 2 


M n + R > M , + R- - 
XU C\C 2 xl CiC 2 


M-+R-- M , + R 

xO C_iC 2 £ xl c 1 c 2 


M _ + R- - > M,+R 

xO C 1 C 2 xl CjC 2 


D. = 0 
Ox 


D 


Ox 


D. =0 
lx 


(2.4.1) 


D. =1 
lx 


Following the decisions, the set of D's (for K=4 , Dg 00 , 

Dg 0 1 f D 010» D 011f D 100> D 101 , D ll0» D l 1 1 ) are passed to the 
decision-memory section and are used to set the new values of 
the state metrics (exclusive of overflow-protection; see Section 
2.4.3). For example, if Dq^ = 1, the new value of the state 

metric M rt is M . + R- - . 

Ox xl Cic 2 

For K=4 , x takes on the values 00, 01, 10, and 11. Thus, 

4 sets of decisions corresponding to (2. 4. 1.2) must be performed 
by the decoder. To implement the add and compare operations 
at high speed and reasonable cost, it is essential to minimize 
the number of bits used to represent the M's and R's. We dis- 
cuss the minimization of the R representation, or metric compres- 
sion, in Section 2.4.2 and the minimization of the M representation, 
overflow-protection, in Section 2.4.3. The method of overflow- 
protection has two dividends; it permits an inexpensive imple- 
mentation of the maximum-likelihood output selection as discussed 
in Section 2.4.6 and of the code synchronization as discussed in' 
Section 2.4.7. Tradeoffs in speed, complexity, logic family, 
constraint length, etc, are discussed in Section 2.4.8. 
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2.4.2 Metric Compression . The smallest achievable error 
probability would result if the received symbols, rj and r 2 , 
were equal to the unquantized (or infinitely finely quantized) 
outputs of the appropriate matched filters, say u t and u 2 . To 
permit digital implementation of the adder function, however, 

Uj and u 2 are quantized to Q levels (For example, Q=2 denotes 
hard decisions where only the sign of the Uj's is kept). The 
loss in performance incurred by using Q=8 rather than Q==° is 
less than 0.2 db for a reasonable choice of quantizer thres- 
holds . 

For the decoder to be maximum likelihood, the branch metric, 
FL j , should equal the log likelihood of the digits r : and r^ 
received for that branch given that i and j are transmitted: 

R. . = log P(r | i trans) + log P(r | j transmitted) 

1 J * 2 

For a digital implementation, however, it is also necessary to 
quantize the {R.^} to a small number of levels. This quanti- 
zation causes an additional performance loss. 

We are interested in representing the branch metrics R 0 0 , 

R 0 j , Rj 0 , and R t t in a minimum number of bits. As noted earlier, 

the use of the quantized numbers r : and r 2 themselves to form the 

metrics provides system performance within 0.25 db (with Q=8) of 

the value achieved with Q=°°. Of course, as Q-*°°, use of r x and 

r„ becomes identical to use of u, and u and hence is maximum 
2 1 2 
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likelihood, while for Q=2, use of binary-valued rj and r 2 is the 
same as the use of Hamming distance and again is maximum likeli- 
hood. 

Consider a Q level quantizer with the Q quantizer levels 
labelled by the integers 0, 1, 2, Q-l from most negative 

to most positive. (Any other labelling scheme can easily be 
mapped into this labelling scheme.) 

If ri and r 2 denote the received symbols and take on the 
above values, then a choice of branch metrics shown to be accep- 
table by computer simulation is: 

Roo = r x + r 2 
Roi = r l + (Q-l-r 2 ) 

Rfo = (Q-l-r i ) + r 2 

Rfl = (Q-l-r! ) + (Q-l-r 2 ) 

The LINKABIT technique of metric compression permits a reduction 

by 1 bit in the number of bits required to represent these numbers 

without degrading performance. Subtracting the same constant 

from each of the R. . does not affect decoder decisions (see Eq. 

ID 

2.4.1), We choose to subtract the smallest of the R^ from each of 
the R „ , thus yielding (at least) one value equal to zero and all 
the others non-negative. If R 0 i is smallest, for example, then 
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R oo 

- R oo 

- R oi 

= 2r 2 - Q+l 

R of 

- R oi 

“ R 0 1 

= 0 

R f 6 

= R 10 

- R 0 'i 

= 2r 2 - 2r ! 

R n 

= R fi 

- Roi 

= -2r i + Q— 1 


Observe in Eqs. 2.4.2 that if Q is odd then each of the R — 
is even and the least significant bit may be discarded. This is 
true in general and is the basis for the LINKABIT metric com- 
pression technique. If Q is even, however, this compression is 
not possible. Compression has a great impact on the amount of 
hardware required since, without compression, an extra bit is 
required throughout the arithmetic section and metric storage. 

This produces the peculiar result that 20% more hardware is 
required to implement Q=8 than Q=9. Fortunately, there is a 
simple solution to the problem of providing compression with an 
even value of Q, namely, to regard Q=8 as though it were Q=7 
by mapping the two central zones of Q=8 into one zone. Although 
this results in slightly nonoptimum metrics, the degradation is 
quite small, especially at low error rates. 

The above metric compression scheme is most easily implemented 
by using a read-only memory if the decoder is for data rates less 
than 10 Mbps. For rate 1/2, there are two received symbols, 
each quantized to three bits if Q=8 for a total of six bits. 

The ROM look up table must therefore have 64 entries. Since four 
branch metrics must be computed, each being a three bit number. 
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each word of the look up table must contain twelve bits. The 
table can be implemented by using two 512 bit ROM's. 

A logic diagram of a TTL input section using ROM's is shown 
in Fig. 2. 4. 2.1. The circuit accepts either serial BPSK with the 
received digits, r t and r 2 , interleaved or QPSK with digits r, and 
r 2 in parallel. The circuit then provides node sync by the 
selective insertion of a one symbol time delay in BPSK or by the 
selective interchange of the symbols in QPSK. If an interchange 
occurs, then ri is complemented. The sync circuit (explained in 
section 2.4.7), controls the bit insertion or interchange opera- 
tions. 

The symbols then address the 1024 bit ROM. The ROM is 
organized into 64 words of 12 bits each. The ROM outputs are the 
four branch metrics. The remaining circuitry subtracts four from 
the branch metrics when normalization (explained in section 2.4.3) 
is required. 

At data rates higher than 10Mbps a different approach is 
used. The quantizer levels are labeled from -3 to +3 instead of 
from 0 to 6 as in the previous case. We then consider the symbols 
as sign-magnitude numbers, i.e., rj = s, 1 t , where s 2 is the sign 
and tj is the magnitude. The compressed metrics are then given by 
the following table: 


S 1 

s 2 

0 0 

R u 

R i» 

R n 

0 

0 

0 

^2 


t;+t 2 

0 

1 


0 

ti+t* 


mm 

0 

*1 

ti+tj 

0 

2 

i 

1 

t,+t 2 

t, 

^2 

0 
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Figure 2. 4. 2.1 

Logic Diagram, TTL Input Section 





The assignments of this table are easily implemented as shown 
in Fig. 2. 4. 2. 2. This circuit is preceded by node sync circuitry 
as in the previous example. The magnitudes are added and the 
branch metrics are obtained by using a multiplexer that is con- 
trolled by Sj and s 2 to route 0, t Jf t 2 and tj + t 2 to the appro- 
priate branch metrics. Normalization can be obtained by sub- 
tracting four as shown in the diagram. 

2.4.3 Overflow Protection . In Section 2.4.2, we demon- 
strate how to compress the metrics to permit R„ 0 , R 0 1 , Rj , and 
Rj j to be represented in the minimum number of bits. We now 

K-i 

concentrate on minimizing the maximum size of the 2 state 
metrics, M 0 , M x , ..., M The technique is applicable to all 

K although the example is for K=4. 

Consider the coder state diagram given in Fig. 2.4. 3.1. It 
is possible to calculate the minimum Hamming distance in going 
from the 000-state to each of the other states. These numbers 
are shown in the square boxes adjacent to each state. Note that 
the maximum Hamming distance is 4. Because of the group property 
of the code, this means that any state is at most Hamming distance 
4 from any other state. 

This observation is critical to minimizing the number of bits 
required to represent the M's. For Q=8, Hamming distance 1 can 
imply an actual metric difference no greater than 3. Thus, it is 
possible for one state, say the 000 state, to have state metric 
M QOO =0 while another state, for example the state 001, has state 
metric M 00l =12. It is not possible for any state metric to be 
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Fig. 2. 4. 3.1 - State Diagram with CiC 2 Labelling Branches 

Minimum Hamming Distance of Each State from the 
000 State is Shown in Square Box Adjacent to State 
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larger than 12 when M 000 “0 (or any other M=0) , since the state in 
question can be reached from state 000 with at most Hamming 
distance 4 and hence compressed metric = 12. 

We conclude that the difference between the smallest and 
largest state metrics never exceeds 12. It is therefore possible 
to represent the state metrics with 4 bits if a means of prevent- 
ing overflows is also provided. To do so, we first observe that 
in a single transition, the smaller state metric in a pair of 
metrics can increase by at most 3. Furthermore, if the spread is 
equal to 12, the largest metric cannot increase at all. These 
claims are verified by examining the state-pair metric transition 
diagram in Fig. 2.4.3.2a and the possible sets of compressed 
metrics in Fig. 2.4.3.2b. The state-metrics are shown on the left 
in Fig. 2.4.3.2a for a case in which the spread between smallest 
and largest metric is the maximum value, 12. The range of the 
state metrics at the outputs are shown on the right (recall that 
the smaller (better) metric of the 2 metrics accessible to a 
state is chosen as the new state metric; see Equation 2.4.1. 

We conclude that if the smallest state metric is in the range 
0 to 4, the largest metric is less than or equal to 15 and hence 
does not overflow a 4 bit register. Furthermore, since the maxi- 
mum increase in the smallest metric is 3, subtracting 4 from all 
state metrics whenever all of them exceed 3 prevents the set of 
state metrics from moving out of the range 0-15. This is the 
overflow-protection strategy that we have adopted. Note that we 
could equally well subtract 3 whenever the metrics are all greater 
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than or equal to 3 but the test for 1VL j greater or equal to 4 and 
the subtraction of 4 is more easily accomplished in hardware than 
similar operations using 3. 

With the combination of metric compression and overflow 
protection, the additions M x + involve the addition of a 4 

bit and a 3 bit number. 


2.4.4 Storage of State Metrics . As explained in previous 

K-i 

sections, 2 state metric values must be stored for each of 
the possible encoder states. At each bit time, 2 additions must 

K-i 

be performed followed by 2 comparisons and select operations. 

If the decoder is operating at a relatively high bit rate, then 
all of the operations must be performed in parallel; however, if 
the data rate is low to moderate, then the arithmetic operations 
may be performed partly or completely in serial. If the operations 
must be performed completely in parallel, there is no organization 
problem in the storage of state metrics since they must all be 
simultaneously accessible. If, however, the operations may be 
performed partly or completely in serial, then more efficient 
organization of the state metric storage becomes a possibility. 

As explained in previous sections, each arithmetic operation 


consists of the addition of the branch metrics R. . and R-r^ to the 
state metrics M Qx and M^ x resulting in metrics M xQ and M xl> 

Suppose, for example, that completely serial operation is possible. 
The state metrics can be stored in a random-access memory such 


that state metric M Qx is stored in location Ox. The problem with 
this organization is that the result of the computation produces 
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state metric M, 


x ^. Now state metric M ^ may not yet have been 
accessed in the present cycle of computations, so the result of 
the present computation may not yet be stored in this location. 
Therefore, state metric storage must be duplicated to temporarily 
store the results of the computations. 

A more efficient technique makes use of the following 
property: if the parity of state Ox is even, then the parity of 

lx is odd and vice versa. The state metrics may be stored in two 
sets of random-access memories, one of which contains the metrics 
of all states having even parity and the other containing metrics 
of all states having odd parity. For example, if x has even 
parity, then Ox has even parity and lx has odd parity. In per- 
forming a computation, the metrics of states Ox and lx are read 
out of the memories, the computation is performed and the result- 
ing metrics xO and xl are written into the memories in the 
locations from which the original metrics were taken. Obviously, 
the metric of state Ox will be found in a different location in 
the memory at each computation. However, the progression of 
locations through which the metric Ox passes can be very simply 
computed. 

The preceding example of metric storage using random- 
access memories is useful for up to two computations performed in 
parallel. If, however, more computations must be performed in 
parallel, random-access memories may no longer be used since the 
organization of such a random-access memory would become unduly 
complicated. 
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If more than two computations must be performed in parallel, 
then all state metrics must be simultaneously accessible and the 
arithmetic sections are switched between state metrics by the use 
of multiplexers » Also, the storage must be duplicated to tempor- 
arily store the results of the computations. The requirement for 
multiplexers and duplicated storage tends to offset the savings 
achieved by sharing a small number of arithmetic units. 


2,4.5 Arithmetic Logical Section . The principle functions 
of the arithmetic-logical section are: 

1. to implement the decision logic of equations (2.4.1), 
generating a set of decisions , 

2. to generate updated state metrics and to transfer the 
new metrics into the state metric registers following a 
clock, 

3. to detect an increase in the minimum state metric above 
the value 3 and to subtract 4 from all state metrics 
following such a detection in accordance with the over- 
flow management strategy, 

4. to set the values of the output decision gating variable 



An Xj is set equal to one only if its corresponding state metric 
is less than or equal to 3. (This function will be explained in 
the next section. ) 

These functions are implemented in the ACS (add-compare- 
select) functional block depicted in Fig. 2 . 4 „ 5 , Two ACS's are 
required for each pair of states if all computations are to be 
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carried out in parallel. Since the ACS block is the critical 
block both in terms of number of components and maximum speed 
capability, complete designs for this block have been worked out 
in both ECL logic and TTL logic using MSI. The ECL design can be 
constructed in three different versions. One version, using 
MECL III throughout, will obtain the maximum possible speed. The 
other two designs use MECL 11^ and MECL II for lower speed and 
greater economy. Several different TTL designs were worked out 
but the one presented here was the best; both in terms of number 
of parts, cost, and speed so this will be the only one presented. 
Both designs are specialized to K=4 in that a 4 bit state metric 
and 3 bit branch metric are assumed. 

Normalization can be obtained in two ways: 1) by determin- 

ing that all state metrics exceed 4 and subtracting 4 from all of 
them during the same computation, or 2) determining that all 
state metrics exceed 4 and subtracting 4 from the branch metrics 
for the next computation. The first approach is suitable for 
small constraint lengths at high speeds where all arithmetic 
operations are performed in parallel. The second technique must 
be used at lower speeds and higher constraint lengths where partly 
or completely serial arithmetic operations are performed. It is 
clear that the first technique cannot be used in this case since 
most of the state metrics are computed and restored before it is 
known whether or not a subtraction of 4 is necessary. At high 
speeds where fully parallel operation is used and at lower 
constraint lengths, there is no particular advantage of one 
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technique over the other except that the first technique will be 
easier to understand and debug. 


2.4.5 .1 ECL Arithmetic-Logic Unit . A block diagram of the 
ECL-ACS unit is shown in Fig. 2. 4. 5.1, a, b, and c. Sheets a and 
b show the adders that add the state metrics to the branch metrics. 
Sheet c shows the comparator and selector. The adder used in this 
ACS unit is a form of adder known as the carry-save adder in which 
ripple carries are used between the stages. The designs of the 
carry circuits are such that there is only one logic delay per 
carry. Since only three carries must be generated, the total 
carry propagation delay is then only three logic delays. For an 
adder this size, there will be no speed advantage to using look- 
ahead-carry over this ripple carry technique, and furthermore, 
the ripple carry technique results in a more economical design. 

The comparator is implemented by subtracting the two sums 
from each other. In this application, we are not interested in 
the actual difference but only in the carry out of the most 
significant stage of the subtractor. Hence, we need only imple- 
ment the carry portion of the subtractor. Another interesting 
feature of this design is that the carries will be rippling 
through the adders simultaneously with the carries rippling 
through the comparator, so that the total add and subtract time 
is not equal to the sum of the add plus the subtract time but 
rather one logic delay more than the add time. 

Another function that must be performed is determining 
whether the result is less than 4. If none of the results of the 
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Fig. 2. 4. 5.1 b 
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computations of the ACS unit produces a result less than 4, then 
4 must be subtracted from all of the results in order to obtain 
the required normalization. The result will be less than 4 if 
either of the two sums is less than 4 since the smaller will be 
selected. Each of the sums will be less than 4 if the carry out 
of the second stage, the third and fourth bits of the state 
metric, and the third bit of the branch metric are all zero. Thus, 
the less than 4 signal may be obtained in only three logic delays. 
The less than 4 signals from all of the ACS units are then ORed 
together. If the result of the ORing operation is zero, then all 
resulting state metrics will be greater than 4. While this is 
taking place, 4 is subtracted from each of the sums. The output 
selector will then select first the smaller of two resulting sums 
and then select either the smaller or the smaller minus four, 
depending upon whether normalization is required. The results are 
then stored in the state metric storage flip flops, thus com- 
pleting the computation. 

Chip counts and typical computation rates have been obtained 
for this design implemented with three different versions of ECL 
logic, ME CL III, ME CL 11^, and ME CL II. The ME CL III design will 
require 25 chips per ACS including the state metric storage. The 
typical maximum computation rate is 90 MHz. The MECL Il^g design 
requires 35 chips and will operate at 40 MHz. The MECL II design 
requires 25 chips and will operate at 25 MHz. The MECL III version 
will be considerably more expensive than either of the MECL II 
versions since much more sophisticated techniques are required to 
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package the MECL III devices. These advanced techniques include 
the use of strip transmission lines, multilayer circuit boards, 
and complicated cooling because of the high power dissipation of 
the MECL III devices. The MECL II and MECL II\ designs may be 
packaged using approximately the same techniques as required for 
TTL. The MECL II devices are, however, slightly more expensive 
on a per chip basis and the level of integration available is 
somewhat less than that available with TTL, as will be seen in 
the following example of a TTL design. 

2. 4. 5. 2 TTL ACS Unit . A logic diagram of the TTL arithmetic 
section is shown in Fig. 2. 4. 5. 2. The inputs to the arithmetic 
section are two state metrics and two branch metrics. The state 
metrics are each represented as 4 bit binary numbers; the branch 
metrics are represented as 3 bit binary number plus a sign bit. 
(Subtracting four from the branch metrics for normalization can 
result in negative values.) Both the branch metrics and the state 
metrics are represented in complemented form. The two additions 
are performed by two SN74181 4-bit arithmetic units operating in 
the add mode. The SN74181 is a 4-bit full adder with full carry 
look ahead. The sum is produced, typically, in 24 nanoseconds. 

If the sum is larger than 15 a carry-out of the 4th stage of addi- 
tion is produced. A carry-out of the 4th position is also pro- 
duced when the branch metric is negative. Therefore the 5th bit 
of the sum is generated by NANDing the carry-out with the sign of 
the branch metric. 
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The smaller of the two resulting sums is determined by sub- 
tracting one from the other using an SN74181 in the subtract mode. 
A carry-out of the subtract will result if sum B is larger than 
sum A. Note that we must compare two 5 bit numbers and the 
SN74181 is only a 4 bit subtsactor. The 5 bit comparison is per- 
formed very simply by subtracting the 4 moat significant bits of 
the two sums in the SN74181. Since we are interested only in the 
carries in the subtraction process, we need only provide the 
carry-out of the subtraction of the two least significant bits as 
input to the carry-in of the 4 bit subtractor. Thus a 5 bit 
comparison cam be accomplished with a single adder chip plus a 
single two input NAND gate. The resulting decision is stored in 
the decision memory section. 

The smaller of the two sums is selected by the four and/or/ 
invert gates provided by the two SN74H51 chips. The an/or /invert 
gates are connected as single pole-double throw switches. The 
result is stored in the state metoric register. 

It is also necessary to determine whether' the output is 
greater than or equal to 4, or greater than or equal to 8. The 
greater than or equal to 4 signal is used to control normalization 
and to determine the decoder output bit. The greater than or 
equal to 8 signal is used for the later purpose when all metrics 
are greater than or equal to 4. Greater than or equal to 4 
is determined by examining the three most significant bits of the 
two sums. If the 3 most significant bits are all equal to 1, then 
the number is less than 4 (remember that the sums are expressed in 
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complemented form) . Greater than or equal to 8 is determined by 
examining the two most significant bits. If both are equal to 1, 
then the result is less than 8. These functions are performed by 
NAND gates which invert the results, converting less than 4, for 
example, into greater than or equal to 4. If this logic were 
performed on the metric output, the time required would be added 
to the total ACS time. Instead, the greater than 4 and greater 
than 8 detection is performed on the two sums prior to comparison 
and selection. The desired result is obtained since, if the 
smaller of the two sums is less than 4, then at least one of the 
greater than or equal to 4 outputs is 0. 

Each arithmetic (ACS) unit can be implemented with 10 inte- 
grated circuits including the metric storage. It should be noted 
that 3 of these IC's are large (24 pin) and relatively expensive 
compared to the other IC's. Including the propagation delay of 
the metric storage units, an arithmetic operation can be per- 
formed in under 100 nanoseconds. 


2.4.6 Decision Memory and Output Selection . As each check 
digit pair is input to the decoder, decisions D. are made by the 

-L A 

arithmetic section as explained in previous sections and trans- 
mitted to the memory-output section* For each state, 16 bits are 
stored as shown in Fig. 2.4.6. The first bit stored for state ix 
is the most recent decision D^ x . The remaining bits reflect the 
results of earlier decisions. 

Consider decision D Qx . If D Qx = 0, the decoder has decided 

that for the most recently received check digits, state xO is more 
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likely than state xl to have been the predecessor of state Ox. 
Conversely , if D Qx = 1, state xl is a more likely predecessor to 
Ox than is xO. When = i, the 16 bits associated with state 

xi are shifted right 1 stage and transferred into the stages 
associated with state Ox. The first stage for state Ox is set 
equal to i, as shown in the figure. Note that the boxes denoted by 
L are flip-flops which each store one bit while the boxes labeled 
M are switches which transfer one of the two inputs to the output 
depending on the value of the appropriate D. 

The bit shifted out of the 16th stage of the memory for 
each state is either discarded or, for one state, selected as the 
output from the decoder. The optimum decoder would select as 
output the bit from the state with the smallest value of M. 
Simulation results have shown that a memory of 12 bits per state 
would suffice if the smallest value of M were used. Such a 
selection is difficult to implement in hardware, since an examin- 
ation of all eight state metric values is necessary in order to 
select the smallest. 

An alternate approach is to always select the output from, 
say, the 000 state. This approach, ignoring the state metrics 
entirely, requires a memory of 24 bits per state to keep the 
degradation small. 

The approach adopted by Linkabit is based on the overflow- 
management strategy and is almost optimum. The Linkabit decoder 
memory is conservatively extended to 16 bits to account for the 
slight nonoptimality. Recall that the overflow-management 



strategy forces the smallest state metric to have a value between 
0 and 3. Because nearest states have a Hamming distance of 2, the 
compressed metric separation generally exceeds 4 , and in general 
only 1 state has a metric between 0 and 3. Moreover, if 2 (or 
more) states had metrics less than or equal to 3, they would 
tend to have similar past histories and hence the same 16th bit 
in memory. 

The selection mechanism is based on examining each state 
metric and setting the variables x^ = 0 if >_ 4 and equal to 1 
otherwise, j =* 000, 001, ..., 111. As shown in Fig. 2.4.6, the 
variable x^ is used to gate the output from the 16th stage of 
memory for state j via an OR gate to the decoder output. If 
more than one x^ is non-zero, the corresponding bits are ORed 
together to form the decoder output. This method is nearly 
optimum and very inexpensive to implement. The slight non- 
optimality is more than offset by the extension of the decision 
memories to 16 rather than 12 bits. Examples of the design of 
the memory output section are given below for both TTL and ECL 
logic families. 

2. 4. 6.1 ECL Memory Output Section . The following design is 
based upon the use of the ME CL II MC1040 Quad Latch. This circuit 
contains 4 latches with individual output gates. The circuit can 
be used to provide both the storage and the switching capability 
required for the decision memory. One and one third stages of 
storage per chip are obtainable using this integrated circuit. 

Each stage is stored in three latches and three different chips 
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as shown in Figure 2. 4. 6.1. The three latches are arranged as 
one master and two slave flip-flops. The output gating on the 
slave flip-flops is used to perform the switching operations. 

The total number of bits stored for K=4 is 8 x 16 = 128. At 
4/3 bits per chip, this requires 96 chips. The output gating 
requires an additional three chips for a total of 99. This 
MECL II quad latch circuit can be used at speeds up to 40 MHz. 

For higher speeds, MECL III must be used. Motorola is 
planning the introduction in the near future of a MECL III cir- 
cuit equivalent to the MECL II MC1040. When this circuit is 
available, the memory output section for speeds up to 90 Megabits 
can be implemented in the same way as in the MECL II case. Until 
this chip becomes available, the decision memory can be con- 
structed using one MC1670 flip-flop preceded by two 2-input NOR 
gates to accomplish the switching for each stage. One and one 
half chips are required oer stage of storage. For K=4 , this 
requires 192 chips. 

2 .4.6.2 TTL Memory Output Section . There is only one TTL 
MSI circuit that can be easily used to form the memory output 
section, the SN74L98, which consists of four storage registers, 
each of which is preceded by a two way switch. Using this cir- 
cuit, 1/4 IC would be required per stage of storage. In the case 
of K=4, this would require 32 IC's. This is a low power, low 
speed IC with a maximum clock rate of only 3MHz. For higher 
speeds, a more brute force approach is necessary. For speeds up 
to 10 Megabits, the storage register can be obtained by using the 


- 111 - 



To 000 state 
Master Latch 
(bits j+1 to j+4 ) 


000 
STATE 

MASTER 
LATCH 

BITS 

tLto. j + 3 

d To^ 


001 

STATE 

MASTER 
LATCH 

BITS 
ti to j + 3 

i — r 

C 0 


C IS CLOCK INPUT 



000 

STATE 

SLAVE 

LATCH 


XI 

C Do o o 


001 

STATE 

SLAVE 

LATCH 


Oooo To 000 state 
Master Latch 
(bits j+1 to j+4) 


"500 

STATE 

SLAVE 

LATCH 


“I JT 


MOO 


001 

STATE 

SLAVE 

LATCH 


rr 


M00 


Fig. 2. 4. 6.1 - Four stages of 000 and 001 state memories 
using quad latches. Output gating on 
latches provides mechanism for selective 
memory transfer and shift. 
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SN7495 which contains 4 storage registers, together with a quad 
2-input multiplexer chip. Thus, 1/2 IC would be required per bit 
of storage for a total of 64 chips in the case of K=4. 

At lower data rates, it is possible to make use of a rela- 
tive speed factor to decrease the number of components required in 
the decision memory. For example, at data rates below 500 Kilobits, 
the memory could be constructed using 16 SN7491 devices which are 
8 bit shift registers. They would be connected as 8 16 bit shift 
registers. Since a relative speed factor is available, the 
register transfers can be accomplished by serially shifting the 
contents of the 8 registers through 8 2-input multiplexers. 

Thus, the memory could be implemented using only 18 IC's. At 
lower data rates, two quad 16 bit MOS static shift registers 
could be used to form the memory together with 2 chips for the 
switching, yielding a total of only 4 IC's for the whole memory 
section. 


2.4.7 Synchronization Section . The purpose of the sync 
section is to obtain node synchronization. This is the only 
synchronization required by a transparent code Viterbi decoder. 

If the code used were nontransparent, then the sync section could 
also resolve the 180° phase ambiguity of the PSK of QPSK demod. 

The sync circuit operates by comparing the rate of increase 
of the best state metric with its expected value. If the rate is 
too high then it is assumed that a bad node sync state exists 
and the node sync state is changed. This is accomplished, in the 

case of PSK, by either inserting or deleting a one symbol time 
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delay in the input section. If the input is from a QPSK modem 
then the sync state is changed by interchanging the received 
symbols r t and r 2 and inverting r 1# 

A logic diagram of the sync circuit is shown in Fig. 2.4.7. 

As can be seen from the diagram, the LINKABIT synchronization 
technique is extremely simple. This, of course, demonstrates 
one of the inherent advantages of convolutional codes over block 
codes. The circuit operates as follows: Every time a metric 

normalization occurs, an up/down counter is counted down by one 
count. The rate at which this occurs is proportional to the rate 
of increase of the best state metric since normalization does not 
occur unless the metric value of all states exceeds 3. The actual 
rate is slightly greater than the hard decision error rate 
divided by four when the node sync state is correct. When the 
node sync state is incorrect, the normalization rate is much 
higher, approximately the bit rate divided by eight. 

The up/down counter is counted up at the bit rate divided 
by a constant. The optimum choice for this constant is approxi- 
mately 16. The up/down counter is not permitted to overflow, i.e., 
when the counter reaches all ones, the up count input is gated 
off. If the counter underflows, then the node sync state is 
changed. When the node sync state is correct, the average drift 
of the counter will be up and the counter will spend most of its 
time in the all ones state with occasional short downward ex- 
cursions. When the node sync state is incorrect, the average drift 
of the counter is down and the counter will soon underflow, thus 
changing the node sync state. Obviously, there is a tradeoff 
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involved in the size of the up/down counter. The larger the 
counter, the longer is the time required for synchronization, 
whereas, the shorter the counter, the larger the probability of 
false loss of node sync. The optimum size of the counter has not 
been determined exactly but is known to be between four and six 
stages. 

The node sync state is actually changed by allowing the 
borrow-out of the most significant counter stage to toggle a 
flip-flop. The state of the flip-flop is the node sync state. 
Thus, when the borrow-out of the up/down counter occurs on counter 
underflow, the node sync state is changed. The node sync state 
is sent to the input section where it controls the addition or 
deletion of the one symbol time delay in PSK. In QPSK, the node 
sync state controls the interchanging of received symbols rj and 
r 2 . 

The up/down counter is implemented in TTL by using two 
SN74191 four stage up/down counter chips. If the counter is to 
be 6 stages long, then the first six stages of the two cascaded 
dividers form the required counter. The seventh stage becomes 
the node sync state flip-flop. Overflow is inhibited by ANDing 
together the first six stages of the counter. When the first 
six stages of the counter are all ones, further up counts are 
inhibited. A MECL circuit can be implemented in a similar 
manner. 
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2 . 4 . 8 Trade-Off Section . 

2.4. 8 0 1 Cost-Complexity Trade-Offs . This section presents 
the results of the cost-complexity tradeoff study for the Viterbi 
decoder, based on the K=4 , rate 1/2, Q=7 designs presented in 
previous sections. Parts counts have been tabulated for four 
different designs; a 10 megabit TTL decoder, a 25 megabit MECL 
II decoder, a 40 megabit MECL 11*5, and a 90 megabit MECL III 
decoder. Weighting factors have been assigned to the different 
logic families to reflect differences in parts cost, design cost, 
and packaging cost. The results are presented in the table below. 
The weighting factors used were; 1 for TTL , 1.5 for MECL II, 2.5 
for MECL 11%, and 7 for MECL III. It should be pointed out that 
these weighting factors are based on a number of subjective 
judgements on the part of the author, and should not be considered 
to be exact or invariant. Any number of things could cause these 
relationships to change in time, for example, a continuation of 
the present TTL price war, new MSI circuit announcements, etc. 


Logic 

Family 

Max 

Data 

Rate 

(Mbps) 

No. 

of 

IC'S 

Relative 

Cost 

TTL 

MSI 

10 

185 

1 

MECL II 

25 

365 

3 

MECL 11% 

40 

450 

6 

MECL III 

90 

470 

18 
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The best way to build a 40 megabit Viterbi decoder will now 
be considered. With the designs that have been worked out in the 
previous sections there are four possibilities. The decoder 
could be built by using MECL III logic, or by using MECL 11% 
logic, or by building two 25 megabit MECL II decoders operating 
in parallel, or by building four 10 megabit TTL decoders operat- 
ing in parallel. 

The last two examples require that an overhead factor of 
approximately 10% be used to account for the cost of the extra 
encoders and the cost of tying the decoders together. The MECL III 
decoder would have a relative cost of 18, the MECL 11% a relative 
cost of 6, the MECL II decoder a relative cost of 6.6, and the 
TTL decoder a relative cost of 4.4. Thus, it appears that the TTL 
design would be the most inexpensive way to obtain a 40 megabit 
Viterbi decoder. Clearly, the MECL III design is not desirable 
from a cost standpoint. The TTL design appears to be superior 
over the whole range from 10 to 100 Megabits, however, a change 
in the weighting factors used could alter this conclusion. An 
additional advantage to the paralleling approach to obtain high 
speeds is that it is very easy to include provisions for fault 
isolation and maintenance. For example, suppose that a 40 mega- 
bit decoder were desired. This could be provided by building 
four TTL 10 megabit decoders. If a fifth decoder were provided, 
then fault isolation could be obtained by switching the spare 
decoder in parallel with each of the other decoders, and comparing 
the output. When a discrepancy is found, the fault isolation 
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circuitry would then automatically determine which of the five 
decoders is defective and automatically switch the spare decoder 
into its place. The maintenance of the faulty decoder would then 
take place while the remaining four decoders continue to operate 
on line. 

In this section we have determined the relationship between 
data rate and cost. In the following sections we will determine 
the relationship between cost and other decoding parameters such 
as code rate, constraint length, and quantization level. 

2.4. 8. 2 Cost Vs. Constraint Length . In the previous sec- 
tion we have shown that the relationship between cost and data 
rate is approximately linear. The relationship between cost and 
constraint length is, however, exponential. As the constraint 
length increases by one, the number of states required doubles. 
This will require doubling the total number of arithmetic units 
and more than doubling the decision memory. The decision memory 
will double in one dimension and increase linearly in the other 
dimension. Also the number of bits required to represent the 
state metrics will increase as the constraint length increases, 
since the distance of the code increases, thereby increasing the 
spread between the best and worst state metrics. Overall the 
decoder complexity relative to the K=4 design goes approximately 
as 

K • 2 ^ K-1 ^ 

4 • 2 3 
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K-6 

The cost relative to the K=4 decoder design will be then K • 2 
For example, a K=6 decoder will be approximately 6 times as 
complex and therefore 6 times as expensive as a K=4 decoder. 

2. 4. 8. 3 Cost vs. Code Rate . Changing the denominator of the 
code rate has its primary effect on the input section of a Viterbi 
decoder. In general there are 2 d branch metrics to be computed 
where d is the denominator of the rate. Thus the input section 
grows exponentially with the denominator of the rate. The size 
of the largest branch metric grows linearly with d. This causes 
an additional increase in complexity as a function of ln 2 d. This 
occurs both in the input section and the arithmetic section since 
larger branch metrics require larger state metrics. The com- 
plexity multiplying factor relative to rate 1/2 is approximately 

46 + 2 d ln 2 d 
50 

The numerator of the rate affects decoder complexity in a 
rather complicated way. The number of states is an exponentially 
decreasing function of the numerator of the rate, thus decreasing 
the state metric storage requirement. However, the complexity of 
the arithmetic operations increases exponentially with the 
numerator of the rate. Thus, for rate 2/4, we do twice as com- 
plex an arithmetic operation on half as many quantities as we do 
for rate 1/2. The net result is no change in arithmetic hardware 
for a fully parallel decoder. The decision memory decreases 
exponentially with increasing numerator. The overall relative 


- 120 - 



complexity multiplying factor is approximately .6 + .8 x 2~ n 
where n is the numerator of the rate. This factor is relative to 
the rate 1/2, Q=8, K=4, TTL decoder. 

2. 4. 8. 4 Cost vs. Quantization . The number of quantizer 
levels is linearly related to the size of the branch and state 
metrics. Thus, the complexity is a function of ln 2 Q. The com- 
plexity relative to the Q=8 design is given by 

3 + ln 2 Q 
6 
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3.0 SEQUENTIAL DECODING 


3.1 Background , Sequential decoding is a procedure for 
systematically searching through a code tree , using received 
information as a guide, with the objective of eventually tracing 
out the path representing the actually transmitted information 
sequence. 

Most sequential decoder implementations to data have used 
some modification of the Fano algorithm. Briefly, the operation 
of the Fano algorithm is as follows. Starting at the first node 
in the code tree, a path is traced through the tree by moving 
ahead one node at a time. At each node encountered, the decoder 
evaluates a branch metric for each branch stemming from that node. 
The branch metric is a function of the transition probabilities 
between the received symbols and the transmitted symbols along 
the hypothesized branch. 

The decoder will initially choose the branch with the largest 
metric value (corresponding to the closest fit to the received 
symbols). The metric is then added to a path metric, which is 
the running sum of branch metrics along the path presently being 
followed. Along with the path metric, the decoder keeps track of 
the running threshold T. As long as the path metric keeps increas- 
ing, the decoder assumes it is on the right track and keeps moving 
forward, raising T to lie within a fixed constant. A, below the 
path metric. If, on the other hand, the path metric decreases at 
a particular node, such that it becomes less than T, the decoder 
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assumes it may have made a mistake and backs up. It will then 
systematically search nodes at which the path metric is greater 
than T until it finds a path that starts increasing again, or 
until it exhausts all nodes lying above T. At this point it is 
forced to lower T, and search again. Eventually it will find a 
path that appears to have an increasing path metric. 

Eventually, the decoder will penetrate sufficiently deep 
into the tree, that with high probability the first few branches 
followed are correct, and will not be returned to by the decoder 
in a backward search. At this point, the information bits corre- 
sponding to these branches can be considered decoded and the de- 
coder may erase received data pertaining to these branches. 

A major problem with sequential decoding is the variability 
in the number of computations required per information digit de- 
coded, The number of computations is a measure of the time re- 
quired to decode, for a fixed decoding speed in computations per 
second. A computation is defined, for the time being^ as either 
looking forward or backward one branch and evaluating and testing 
the metric involved. The cumulative distribution of computations 
performed per digit decoded, c, has been upper and lower bounded 
for the discrete memoryless channel by a Pareto distribution 
(Ref. 1, 2) , that is 

Prtc>L] rJ k L~ a , L> >1 , (3.1.1) 
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where k is a constant and a is determined by the relationship 

V a) 

r = _2 (3.1.2) 

where R is the code rate. 

Here E Q (a) is a convex function of a which is determined by the 

channel transition probabilities, which are in turn a function of 

E b /N 0 . This function has the properties that E q ( 0) «* 0, and 

E q ( 1) = R CO mp* Therefore, we can see from Eq. (3.1.2) that if 

R=R • _, a=l. R is the so called computational cutoff rate 

comp comp 

of sequential decoding. 

Because a>l for R<R com p> the average number of computations 

per node decoded is finite, but for rates greater than R q , this 

average is unbounded. Actually, for finite constraint lengths, 

the computation distribution drops off faster than Pareto for 

very large L. Thus, the average computation remains finite but 

large for R>R . 

o 

Because of the variability of the amount of computation 
required, there is a non-zero probability that incoming received 
data will fill up the decoder memory faster than old outgoing 
data can be processed. If the decoder tries to search a node 
for which received data has passed out of buffer memory, an over- 
flow is said to occur. When an overflow occurs, the decoder must 
have some mechanism for moving forward to new data, reacquiring 
code synchronization and starting to decode again. There are 
presently two techniques for doing this . One involves segmenting 
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the data into blocks. After each block, a fixed constraint 
length long sequence is inserted. Should the decoder buffer over- 
flow while decoding a given block, it can simply give up decoding 
that block and jump to the beginning of the next block to resume 
decoding. Code sync is immediately attained through knowledge of 
the fixed data sequence preceeding a block. This technique has 
the disadvantage that it requires an initial search to acquire 
block sync, and there is a loss in efficiency due to the insertion 
of known sync bits into the data stream. 

Another overflow recovery technique does away with data block- 
ing (Ref. 3). When an overflow occurs, the decoder jumps ahead to 
new data, and guesses the coder state at that point based upon 
received data. This technique is described in detail in a subse- 
quent section. 

The probability of overflow for sequential decoding can be 
related to the distribution of computations per bit only in an 
approximate manner. Suppose the decoder has a speed factor of y, 
that is, it is able to perform y computations per branch worth of 
data received. Suppose also, a decoder buffer capable of storing 
B branches worth of received data is used. With an initially empty 
buffer, the decoder may perform yB computations in progressing 
one bit deeper before an overflow occurs. Thus, from Eq. (3.1.1), 
the initial overflow probability is 

P Q = k(yB)" a (3.1.3) 
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Since overflows can occur through the concatenation of several 
shorter searches, one intuitively expects that the actual overflow 
probability would be larger than (3.1.3). However, as long as y 
is somewhat larger than the average number of computations per 
bit, simulations have shown (3.1.3) to be remarkably accurate. 

Of course, when an overflow does occur, many bits will be lost, 
whatever the restarting method. Thus, the rate of bits lost due 
to overflow will be 


P oB = LP o (3 - 1 ’ 4) 

When y is close to the average computations per bit, as is 
the case in a high data rate sequential decoder, simulation is the 
only reliable means of determining overflow frequency. 

The error probability with sequential decoding can be made 
as small as desired by increasing code constraint length. Long 
constraint lengths are practical for sequential decoding because 
decoder complexity is only a weak function of code length, unlike 
Viterbi decoding. 

It has been shown (Ref. 4) that for systematic codes, the 
undetectable error probability can be upper bounded by 

P (e) < k" 2 “ K(1 " R)R comp/ R (3.1.5) 

for R<R where K is the code constraint length. The actual 

comp 

achievement of this rate of decrease in P(e) with K is dependent 
on the choice of branch metrics for the decoder. This will be 
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With Syndrome 






discussed in section 3.2. 

3.2 Hard Decision Decoder . This section will deal in part 
with the interesting Fano algorithm modifications that offer po- 
tential advantages for a high speed hard decision sequential de- 
coder. Simulations will be reported for decoders using these 
modifications. Non-real time simulations have provided information 
on distributions of computations and undetected error rate. Real 
time simulations , using simulated decoder speed factors and buffer 
sizes, determine the overflow behavior of the decoder. 

Also considered are code synchronization, code selection, 
and data quality information. 

3.2.1 Syndrome Sequential Decoder . We will restrict our 
attention to systematic rate 1/2 convolutional codes with hard 
receiver quantization, for the sake of example. The technique 
generalizes easily to non-systematic codes, other rates, and even 
soft decisions. 

For a rate 1/2 systematic code, a received information bit 
and a received parity bit are input to the decoder at each bit 
time. To form the code syndrome, the received information bits 
are passed through a replica of the encoder and the generated 
parity bits are exclusive-ORed with the received parity bits. 

Fig. 3.2.1 shows a representation of the encoder, channel, and 
syndrome calculator for a K=3 code. The noisy channel is mod- 
elled by the mod-2 addition of occasional errors (ones) to the 
encoded information and parity streams. 
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Clearly, in the absence of noise, the syndrome bits input to 
the decoder are all zero regardless of the information sequence. 
This is because the parity bits generated in the syndrome calcu- 
lator, 2."* an ^ t ^ ie received parity bits, are both equal to 
the actual parity bits, jd. Thus, since the code and the channel 
action are linear, the syndrome is a function only of the noise 
sequences. We can assume, therefore, without loss of generality, 
that the all zeros code sequence is transmitted. 

This being the case, note that a single error in the informa- 
tion stream manifests itself in Fig. 3.2.1 as three consecutive 
l's in the syndrome. In general, an information error causes the 
code generator to be exclusive-ORed into the syndrome. Each pari- 
ty error, on the other hand, causes a single 1 to be exclusive- 
ORed into the syndrome. 

It can be shown that putting the received data into the form 
of a syndrome is information lossless. A decoder operating on s_ 
can perform as well as one operating on iL" and p". The function 
of a decoder operating on a syndrome sequence is to determine the 
most likely information and parity error sequences that could have 
resulted in that particular syndrome sequence. For a binary sym- 
metric channel, this corresponds to determining the minimum weight 
error sequence consistent with the syndrome. The decoder forces 
the syndrome sequence to zero, by exclusive-ORing a "1" where it 
believes a parity error occurred, and the code generator where it 
believes an information error occurred. 
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A syndrome sequential decoder keeps track of a metric as it 
"zeros" the syndrome. Each time it hypothesizes the occurrence 
of an error, the metric decreases. When it hypothesizes no error, 
the metric increases. If the decoder finds it has to correct too 
many errors in forcing the syndrome to zero, it will back up and 
change hypothesized information error decisions. Note that each 
information error decision affects the syndrome over a full con- 
straint length. 

Functionally, the syndrome decoder can be viewed as a box 
whose input is a syndrome sequence, and whose eventual output is 
an information error location sequence. This sequence is then 
used to correct errors in the received information sequence to 
form the decoder output. 

3.2.2 Algorithm Modifications 

3. 2. 2.1 Guess and Restart Overflow Strategy . The guess and 
restart technique was developed and successfully implemented in a 
sequential decoder previously (Ref. 3). When a buffer overflow 
occurs in a sequential decoder due to a long search, the decoder 
must jump forward in the syndrome stream and resume decoding. If 
the data is not blocked, the decoder does not have definite know- 
ledge of the coder state when it starts decoding again. What is 
required is knowledge of one constraint length of information bits 
at the point decoding is resumed. 

The best "guess" that can be made is that no information errors 
have occurred in the vicinity of the restart point. This "guess" 
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can be implemented by assuming a zero syndrome when restarting. If 
information errors actually occurred, the decoder will likely over- 
flow again, requiring another restart. If the guess was correct— 
an if the succeeding data is not too noisy— the decoder will work 
its way through the buffer and resume normal operation. 

The decoder skips over a segment of the syndrome in restarting 
after an overflow. The information error decisions are set equal 
to zero over this segment. Thus when an overflow occurs, the de- 
coder output corresponding to the unprocessed data will be the 
raw, uncorrected received information bits. These bits have 
errors occurring at the channel error rate. 

Simulations using guess and restart are presented in section 
3.2.4. 

3. 2. 2. 2 Quick Threshold Loosening . A sequential decoder op- 
erating below R comp (E b /N Q greater than 4.6 db in the case of a 
rate 1/2 code with hard quantization) spends much of its time 
plowing forward on the correct path, or in short searches, correct- 
ing single or double errors. Long searches contribute less and 
less to the average computation per bit as E^/Ng goes up. Reduc- 
tion of the time consumed just "plowing forward" or "keeping up 
with the data" is treated in section 5.0 of this report. Here 
we are concerned with reducing the number of short searches re-, 
quired. The technique of quick threshold loosening achieves this 
end for a hard quantized, rate 1/2 decoder. 

A single error in the received data stream is usually suffi- 
cient to cause the path metric to fall below threshold and initiate 

a backwards search. In general, the decoder must search backward 
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at this point, rather than lowering the threshold and resuming 
the forward search. This is because, in backing up, the decoder 
may find another path that it can follow without lowering the 
threshold. Had the decoder lowered the threshold when the metric 
first violated it, the possibility exists of getting on a path for 
which the algorithm will not allow further threshold tightening. 
Thus, the threshold cannot be lowered unless all accessable paths 
above the threshold eventually lead below it. 

For a rate 1/2 hard quantized decoder there are three pos- 
sible branch metrics. The first corresponds to two matches between 
branch code symbols and received data. The second is for one 
match and one mismatch, and the third for two mismatches. Fur- 
thermore, good codes always have the property that branches stem- 
ming from the same node have complementary code symbols on them. 
Suppose the symbol match metric is equal to 1, and the mismatch 
metric is -a (the optimum range for a is from about 9 to 11). 

This means that either one branch leaving a node has metric +2 
and the other -2a, or they both have metric 1-a. Fig. 3. 2. 2.1 
shows the way in which a typical short search is initiated due to 
a single error. 

When node A is reached for the first time, the threshold is 

tightened to the value T ^ . The decoder then proceeds on to node 

B, C and D. Looking forward from D the decoder sees a tie vote 

(branch metrics of 1-a on both branches). Both path metrics at 

nodes F and E are below T . This would normally initiate a back- 

i 

search thru nodes C, B and A, requiring tests of metrics at nodes 
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G , H and I. Then at node A the threshold would be lowered to T t -A 
and the decoder would step forward again thru B, C, D and then E or 
F. In this situation the decoder could have avoided the backsearch 
by lowering the threshold to T -A' when threshold T t was first vio- 
lated in looking forward from node D. This can be done because 
there is no other path remaining above T which can be followed. 
This "instant" threshold lowering can be done if A £ a-1. This 
restriction insures that a threshold violation on the path being 
followed determines that no other path remains above threshold. 
Quick threshold loosening is allowed only when the threshold has 
been previously tightened to its present value (it is tightened 
to Tj at node A in Fig. 3. 2. 2.1). If the threshold has been 
previously loosened all bets are off, since now a backsearch after 
threshold violation is necessary to insure non-existence of another 
path that remains above the present threshold. 

The quick threshold lowering scheme is easy to hardware imple- 
ment.* It only requires a one bit flag indicating whether or not 
the threshold had been previously tightened, and logic to prevent 
entering backsearch mode when a "quick threshold loosening" is 
possible . 

A simple modification of the "quick threshold loosening" 
scheme allows decoding past isolated pairs of error without initia- 
ting a backsearch to lower the threshold. 

Fig c 3. 2. 2. 2 shows a section of the code metric tree near a 
pair of symbol errors. The correct path segment consists of 

* G,D, Forney informed us this form of threshold loosening was 
implemented in Codex's 5 Mbps sequential decoder (Ref. 3). 
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Fig. 3. 2. 2. 2 Tree Section in the Vicinity 
of a Pair of Channel Errors. 
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branches from nodes 0 to 8. Symbol errors have occurred at branch 
levels F and H. If the initial two bits of the code generator are 
11, then not only do 2 branches stemming from a common node have 
complementary symbols, but the four branches stemming from these 
nodes must contain all four combinations of the two code symbols. 
This insures that a tie vote at a node (such as node 1 in Fig. 

3. 2. 2. 2) must be followed by another tie vote at one node (node 2), 
and by a complete match and a complete mismatch stemming from the 
other node (node 3). Therefore, when an error occurs, the metrics 
on all paths except one, must fall through at least two threshold 
levels. 

Suppose that the threshold has been tightened twice in a row, 
as i3 the case in going from node 0 to node 1. The decoder may 
go to node 3 and lower the threshold to T - A, since there are 

l 

no other unsearched paths in the T to T +A interval. At node 4 

l l 

the decoder faces another tie vote that takes the metric below 
the present threshold, T^-A. Now because of the properties of 
codes with generators beginning with 11, there cannot be any 
unsearched path segments in the range T^-A to T • Thus the 
threshold may be lowered again, and the decoder may continue on 
to nodes 7 and 8. 

This modification of "quick threshold loosening" is also 
simple to implement. An up-down counter which counts from 0 to 
2 is needed. Starting at "0", each time the threshold is tight- 
ened the count is incremented by 1, saturating at 2. When the 
threshold is violated in looking forward, it may be immediately 
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loosened if the count is positive. If so, the count is decrement 
by 1. When the count is zero, a backsearch must be initiated. As 
in ordinary "quick threshold loosening", we must have A = a-1. 

When stepping forward to a node in the case of a tie vote, care 
must be taken to not step first to the node having another tie 
vote following it (node 2 in Fig. 3. 2. 2. 2). This requires the 
ability to look at one syndrome bit into the future. 

These "quick threshold loosening" schemes have been simulated 
to determine their effectiveness in reducing the number of short 
searches. Fortunately the value A = a-1 is the near optimum 
choice of threshold spacing. These simulations, along with almost 
all of those reported here, use code reported in Ref. 3. Factors 
influencing the choice of code are discussed in section 3.2.3. 

Fig. 3. 2. 2. 3 shows the distribution of computations per bit de- 
coder with 

1) The unmodified Fano algorithm 

2) "Quick threshold loosening" 

3) Modified "quick threshold loosening". 

These simulations were performed with p = .02. Included in the 
figure is the average number of computations per bit, c, for each 
run. Both quick threshold loosening schemes clearly eliminate 
searches with computations below about 20 to 40. 

The modified "quick threshold loosening" is somewhat more 
effective than the loosening scheme described first. This is be- 
cause it is capable of eliminating searches due to more error 
patterns. Fig. 3. 2. 2. 4 shows a comparison of the computation 
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Fig. 3. 2. 2. 3 Distribution of Computations per Bit Decoded 
for Standard Fano Algorithm and "Quick Threshold Loos-- 
oning" Sequential Decoders. Threshold Spacing - 8, a=*9, 
Channel Error Rata * .02. 
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distributions for the standard Fano algorithm and modified quick 
threshold loosening decoders for p z .035. 

Neither scheme substantially affects the tails of the compu- 
tation distribution. This is because the long searches typically 
involve lowering the threshold by more than one or two levels, at 
which point in the search quick threshold loosening is disabled. 

Because of the simplicity and effectiveness of the scheme, 
quick threshold loosening was used in many of the simulations, 
including all of the real time simulations that will be described. 

3. 2. 2. 3 Look Ahead Sequential Decoding . Look Ahead Sequen- 
tial Decoding is another technique which attempts to improve the 
distribution of decoding computations. The standard Fano syndrome 
sequential decoder examines and acts upon the syndrome one bit at 
a time. If a path being followed is destined to fall below the 
current threshold, the decoder must follow the path until it actu- 
ally does fall below the threshold before backing up and changing 
direction. 

With look ahead decoding, the decoder examines N syndrome 
bits at a time. This is equivalent to looking forward N branches 
into the tree from the present branch. Using a table look-up 
procedure, the decoder determines if there is any path N branches 
ahead that satisfies the current threshold. If such a path exists, 
the decoder is allowed to step forward on a next branch if that 
can be done without violating the threshold. If no path exists N 
branches ahead that satisfies the threshold, then it is useless 

to allow the decoder to proceed forward. In that case, the look 
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ahead decoder initiates a backsearch, regardless of whether or 
not the next branch path metric satisfies T. 

This scheme clearly does not do anything the standard Fano 
algorithm does not eventually do; however, it has the potential 
for doing considerable less. It reduces the depth to which a 
path must be searched before it is rejected. 

Fig. 3. 2. 2. 5 compares the distribution of computations for 
sequential decoding with and without look ahead, with a look a- 
head parameter of N = 6 branches. Use of the look ahead decoder 
with N = 6 results in an improvement in the computational dis- 
tribution of about a factor of two. 

From this figure, it is apparent that while the distribution 
is lower, the Pareto exponent, or the slope of the curves, is the 
same. It can be shown analytically that no sequential decoding 
algorithm which "looks ahead" a finite number of branches can 
change the asymptotic slope of the computation distribution curve. 
The most that can be hoped for is a lowering of the constant in 
front of the distribution (k in Eq. 3.1.1). 

It is interesting to note that the effect of a scheme like 
looking ahead, which reduces the frequency of long searches, will 
be to reduce initial buffer overflow probability; while a scheme 
like "quick threshold loosening" will allow a full buffer to 
empty more quickly - allowing for a smaller decoder speed factor. 

Practically speaking, the look-ahead mechanism is not as 
simple to implement as "quick threshold loosening", especially in 
very high speed decoders. The syndrome table look-up is a complex 
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logical function which grows more complex exponentially with N. 

For N larger than about 6 it would have to be implemented using a 
read-only memory. Implementation would tend to slow down the 
decoder computation rate, which partially counteracts the improved 
computation distribution. Also, although the improvement in the 
distribution looks substantial, it only represents an improvement 
of about 0.1 db over the standard algorithm at points in the dis- 
tribution which affect overflow probability with the real-time 
decoder parameters that were studied. For these reasons, look- 
ahead decoding was not used in the real-time decoder simulation 
reported in section 3.2.4. 

3. 2.2.4 Sequential Decoding with Sideways Looks . When a 
sequential decoder is stepping back to a node in the code tree, 
and the branch it is stepping through is a "best branch" , its 
next step will be to look forward along the next best branch. In 
practice looking at the next best branch can be accomplished very 
conveniently before stepping back. If this node satisfies the 
threshold, and if the previous node also does, then a "step side- 
ways" can be made directly to the next best node. Thus, what was 
two computations becomes one computation. Sideways looks will be 
counted as one rather than two computations in the simulation which 
follows. 

3.2.3 Decoder Undetected Error and Computational Performance . 
The sequential decoder simulations described thus far, and the ones 
presented in this section are for decoders operating in a non-real 
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time mode. That is, received data is generated as the decoder 
needs it. The performance data gathered is, therefore, that for 
a decoder in which buffer overflows do not occur. Real time de- 
coder behavior, with a simulated buffer, is reported in the next 
section. 

3. 2. 3.1 Code Selection . Choosing codes is not as critical 
for sequential as it is for Viterbi decoding. Decoder complexity 
is not a strong function of code constraint length; so, the unde- 
tected error performance of a code can be improved by increasing 
K rather than trying to optimize a code for a given value of K. 
Still there are several reasons for having as good a code as 
possible. 

1) The guess and restart overflow technique performance 
degrades with increasing constraint length. This is 
because a constraint length worth of data must be 
correctly "guessed" to restart decoding. 

2) The constant, k, in the computational distribution 

is somewhat sensitive to the code. Good code distance 
properties will result in value of k=l or less. 

3) The encoder replicas in the decoder do grow linearly 
with K, resulting in some additional cost and complexity. 

It has been found through simulation that of the known codes, 
the various truncations of the rate 1/2 systematic codes due to 
Lin and Lyne (Ref. 7), and Bussgang and Forney (Ref. 3,5) proved 
best in undetected error rate, and distribution of computations. 
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Forney's extensions of Bussgang's code seem to have the 
slight advantage. Since this code has already been used in two 
high speed sequential decoder implementations we have concentrated 
our efforts on it. A search for more optimal codes is probably 
not worthwile because 

1) The number of codes of constraint length 40 or thereabout 
is huge. 

2) The truncations of Forney's code have minimum free dis- 
tances close to the upper bound on d f . 

Forney's code, to constraint length 45 has the generator 
715473701317465 in octal digits. The first two binary digits in 
the generator are 11, satisfying the requirement of the modified 
"quick threshold loosening" scheme. All simulations in this 
report use this code generator or its shorter constraint length 
truncations. 


3.2. 3.2 Decoder Parameters . Two decoder parameters which 
must be selected are the threshold spacing A, and the symbol mis- 
match metric, -a. 

Simulations have shown (Ref. 3) that decoder computational 
performance is not extremely sensitive to A. A broad minimum in 
c, the per bit average computation, exists centered about A = 10. 
This value of A is convenient, as it turns out, in that it meets 
the requirements of the "quick threshold loosening" scheme. 

Whereas computational performance is not strongly affected by 
A, it is very sensitive to the symbol metric ratio l/-a. Here 1 
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is the metric assigned to a match in code and received symbols, 
and -a is the metric assigned to a mismatch. When operating at 
R=R comp ( corres P° n 3ing to p = .0447 for rate 1/2), the value of a 
that maximizes the computational distribution Pareto exponent is 
a = 9.1. Any different value of a will decrease the Pareto expo- 
nent. However, as p decreases, R increases, and the optimal 

' comp 

value of a increases. At p = .035, the best choice for a is near 

10. From an implementational point of view, by far the simplest 

values of a to use are odd integers. Therefore, the choice seems 
to be between 9 and 11. 

It has been shown (Ref. 4) that for a systematic code, when 
a is chosen to optimize the distribution of computation, the error 
probability does not go down as rapidly with K as shown in the 
bound in Eq. 3.1.5. This means that significantly larger values 
of K are needed than would be necessary for a larger a. Our sim- 
ulations, which are in substantial agreement with those in Ref. 3, 
indicate, for instance, that for p = .0447 ( R comp = 1/2), about 

the same bit error rate is attained with K = 45 and a = 9 as with 

K = 37 and a = 11. That error rate was just above 10" 5 . Other 
simulations indicate that the behavior of the error rate with K 
is consistant with the bound of Eq. 3.1.5 when a = 11. Also, 
since the optimum a is closer to 11 than 9 when p is greater than 
.035 (which is really the range of interest for high data rate, 
low error rate operation) , we have elected to concentrate on simu- 
lation with a metric ratio of 1/-11. 
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3. 2. 3. 3 The Distribution of Decoding Computations . Before 
proceeding to present the computational statistics gathered through 
simulation, we redefine a computation to be consistant with what 
happens in a hardware computational cycle. For this purpose it is 
more accurate to define a computation as having occurred when the 
decoder steps rather than looks forwards, backward or sideways. 

This eliminates counting as a computation: for instance, a look for- 
ward by the decoder that does not result in a step forward. 

Using this new definitation Fig. 3.2. 3.1 present the distri- 
bution of computations per bit decoded for a range of channel 
error rates, p. For each value of p, 8 x 10 6 bits were decoded. 

The straight lines are best fits to a section of the tail of the 
distributions where significant data exists. The negative of the 
slopes of these lines are the measured exponents of the assumed 
Pareto distribution. As expected, the fit to the Pareto distri- 
bution is excellent. Table 3. 2. 3.1 shows the measured Pareto 
exponent a m vs. p. Also included is the theoretical Pareto expo- 
nent, cx T , which assumes the use of the optimum metric ratio, l/-a. 
Clearly, there is some degradation in the exponent for the higher 
values of p. (corresponding to operation near R com p) » however, 
for p >. .035, the measured and theoretical values are close. This 
indicates that the optimum range for a is broad when p is low. 

3.2. 3.4 Measured Undetected Error Rates . Table 3. 2. 3. 2 
shows the measured number of bit errors vs. p for the constraint 
length K = 33, 37, and 41 codes. In each case 8 x 10 6 bits were 
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Fig. 3.2. 3.1 Sequential Decoding .Distribution 
of Computations for Several Channel Error 
Rates, p. Runs are all of 8x10* Bits. K-41, 
A«10, Metric Ratio is 1/-11, , 
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E b /N 0 

P 

“t 

a m 

— 

c 

4.7 db 

.043 

1.05 

0.97 

7.08 

4.9 db 

.039 

1.16 

1.12 

3.09 

5.1 db 

.035 

1.28 

1.29 

2.03 

5.4 db 

.030 

1.46 

1.44 

1.50 

5.8 db 

.025 

1.67 

1.66 

1.25 


Table 3. 2. 3.1 Measured and Theore- 
tical Computational Distribution 
Parameters. K**37, A=10, Metric 
Ratio is 1/-11. 































p 

K 


33 

37 

41 

.043 

390 

205 

79 

.039 

128 

120 

0 

.035 

37 

0 

0 

.030 

25 

0 

0 

.025 

0 

0 

0 


Table 3. 2. 3.2 Undetected Bit Errors 
vs. p and K, For Decoder Runs 
of 8x10 s Bits Each. 





decoded. Only a small number of error events, if any, have 
occurred at the smaller values of p, making this data rather insig- 
nificant. At p =* .043, however, the error probability is 

— K/2 

fairly close to 2 ' , as the upper bound predicts. 

3.2.4 Real Time Sequential Decoder Simulation . In the real 
time sequential decoder simulation program, the decoder buffer 
waiting line is simulated by a counter. The counter is incre- 
mented every decoder computations to simulate the periodic 
arrival of received data. The counter is decremented when a new 
node level is reached by the decoder. Thus, the count represents 
the number of branches of received data in the buffer waiting to 
be decoded. When the count reaches a number equal to bhe simulated 
buffer size, an overflow is declared. At this time the guess and 
restart routine attempts to jump forward about one constraint 
length, and resume decoding. 

In a high data rate decoder, where the speed factor y is 
only slightly larger than c (the average number of computations 
per bit), the overflow probability will not obey the heuris- 
tically derived formula of Eq. 3.1*3. Also, when an overflow 
does occur, it will take many resync trials to successfully start 
decoding again. This is true because even when the correct 
syndrome guess is made, the decoder speed advantage is so slight 
and the buffer is so full, that the decoder will often overflow 
again. 

Fig. 3. 2.4.1 shows measured decoder composite output error 

probability vs. constraint length for a decoder with a buffer size 
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Fig. 3. 2. 4.1 Decoder Composite Output 
Error Rate vs. K. p=.035; y=2.5 
Buffer Size, B=64xlQ3. Each Point 
Represents 75x10 6 Bits Decoded. 
Measured c « 2.1 in All Cases. 



of 64 x 10 3 bits, a speed factor, p, of 2.5 and a channel error 
rate p « .035. Here composite output error rate refers to errors 
due to overflows, as well as undetected errors. In all cases the 
average number computations per bit was about 2.1. This figure 
is somewhat surprising in that error rate goes up with constraint 
length. This peculiar behavior can be explained as follows. With 
the longer constraint lengths, errors due to overflows completely 
dominate the composite error rate. At a constraint length of 45, 
the average number of related overflow events following an initial 
overflow is about 16. Each overflow causes about 300 raw undecoded 
information bits to be output by the decoder. Since the error rate 
on these undecoded bits is p = .035, each overflow results in about 
10 output errors. Since overflows occur in bursts of an average 
of 17 apiece, an overflow burst typically results in 170 output 
errors. 

Shortening the constraint length will cause undetected errors 
to take the place of some of the overflows. The average number of 
bit errors in an undetected error event is far less than 170. 

Thus, decreasing the constraint length has the net effect of 
improving the composite error rate. Of course, after a point 
undetected errors will dominate, and decreasing K further will 
increase the error rate. This point, however, has not been 
reached in going down to constraint length 30 , as shown in the 
figure. 

The reason that about 300 bits are output in uncorrected 
form in a overflow, is as follows. When an overflow occurs. 
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error position decisions up to 256 branches back from the point 
of deepest decoder penetration are considered unreliable, and are 
erased. The decoder then jumps over about a constraint length 
worth of untouched data, for a net of about 300 bits. The reason 
that 256 decisions are erased is that, in a long search, the de- 
coder may back up and change decisions as many as 200 nodes back 
or more. The most recent nodes must therefore not be considered 
finally decoded until the decoder has progressed at least 200 
nodes deeper into the tree. 

Fig. 3. 2. 4. 2 shows composite bit error rate, in the neighbor- 
hood of 10" 5 , vs. p for two decoder speed factors, y = 2.5 and 
y = 1.5. Clearly the curves are extremely steep. Operation at a 
speed factor of 1.5 with a bit error rate of 10" 5 requires a de- 
crease in p of only about .006 compared with operating at the same 
bit error rate with a speed factor of 2.5. 

This comparison can be made even more dramatic if it is put 
in terms of data rate and E^/N . Suppose we have a sequential 
decoder which is capable of 100 mega-computations per second. At 
a data rate of 40 Mbps, the speed factor is 2.5. A composite bit 
error rate of 10" s is achievable with a p of about .033 or an 
E^/N = 5. 2db. 

Likewise the decoder can operate at a data rate of 66 Mbps 
with a p of about .027, or E,/N = 5.7 db. The decoder data rate 

a o 

can be increased by over 50%, with the same output error rate, at 
the expense of only about 0.5 db! 
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Bit Error Rate 




Clearly there is a continuity of data rates and E, /N that 

D o 

correspond to a composite bit error rate of 10" 5 . Fig. 3.2.4. 3 
shows a curve of such data rates vs. E b /N Q . The curve is an 

interpolation and extrapolation of the two points obtained 

from Fig. 3. 2. 4. 2. 

3.2.5 Erasures vs. Undetected Errors . In some applications 
erasures are not nearly as bad as undetected errors. If this is 
the case, output error can easily be decreased by orders of 
magnitude by 

1) Increasing K until overflow errors dominate 

2) When an overflow occurs declare the 300 or so bits 
affected by the overflow as erased. 

This results in a bit erasure rate of about 1/p times the 
original composite error rate. However, undetected error rate 
can be as small as desired, depending on K. 

3.2.6 Systematic vs. Nonsystematic Codes . All of the 
simulations reported were run using systematic codes. Systematic 
codes have the advantage over nonsystematic codes that in the 
event of decoder failure, the raw information bits are available 
directly as back-up. This advantage has been obviated, to a 
degree, by the invention of the "quick-look" nonsystematic codes 
(Ref. 6). Use of these codes allows for simple information bit 
generation without a decoder. The resulting information stream 
does have an error rate of about 2p, however. There are two main 
advantages in using nonsystematic codes 
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1 ) 


A constraint length of only about half the systematic 
code constraint length is required for the same detec- 
ted error rate performance, and 

2) The optimum metric ratio, on the basis of the distribu- 
tion of computations, also results in the achievement 
of the optimum undetected error probability exponent. 

This is unlike the case for systematic codes, where a 
larger value of "a" is required for optimal error 
performance. 

The metric ratio advantage in 2) is completely lost at 
p > .035. In this most interesting range, for high speed decoders, 
the optimum metric ratio range is broad enough to allow joint 
optimization of the computation distribution and error exponent 
with systematic codes. 

With the guess and restart overflow strategy, some form of 
"guess" of undecoded information bits must be output when the 
decoder overflows. Using a non-systematic code, these "guesses" 
will be less reliable than with a systematic code. 

For these reasons we have elected to concentrate on system- 
atic codes. 
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3.2.7 Code Synchronization and Channel Reliability 
Prediction . Code synchronization and channel reliability can be 
handled in much the same manner as discussed in conjunction with 
Viterbi decoding. Recall in that case that an up-down counter 
was used, which counted up k on the occurrence of an error and 
down by 1 at each bit time, never going below zero. Code sync 
3tate is changed when the count exceeds a value T. In the 
sequential decoder it is even simpler. The counter now counts 
up by one when a threshold is loosened, and down by one when a 
threshold is tightened. The skewing of the count is done auto- 
matically by the skewed metric ratio. False alarm, and resync 
rates are directly obtainable from the E fa and E r£j curves in 
Figs. 2. 2. 5. 4 and 2. 2. 5. 5. 

3.3 Soft Decision Sequential Decoding 

3.3.1 Syndrome Decoder . A syndrome sequential decoder can 
be used to advantage with a soft decision decoder as well as a 
hard decision decoder. The syndrome is formed using only the hard 
quantized information in the received data. Passed on to the 
decoder, along with the syndrome bits, is 2 data quality bit per 
branch for 4 level quantization, or 4 quality bits per branch for 
8-level quantization. 

The decoder uses the 3 or 5 bit per branch information to 
generate information error decisions just as in the hard decision 
case. Now, however, efficiency is improved by the availability 
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Fig. 3.3.1 Sensitivity of Measured 
Pareto Exponent to Deviations 
From the Proper Metric Bias. 
Eb/No=2.0 db; Rate 1/3 Code; 
8-Level Quantization. 
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of the quality data. The error decisions are used to correct 
the stored, raw, hard quantized received information bits. 

3.3.2 Fano Algorithm Modifications . The guess and restart 
overflow strategy is applicable to a soft decision decoder. Here 
the "guesses" will be less frequently correct, because of the 
lower required E^/Nq with a soft decision decoder. 

Quick threshold loosening does not carry over to soft deci- 
sions. The wider range of branch metrics makes it hard, if not 
impossible, to take advantage of code and metric structure to 
allow quick threshold loosening. 

3.3.3 Sensitivity to Incorrect AGC . Unlike Viterbi decoding, 
soft decision sequential decoding is extremely sensitive to im- 
proper threshold level setting, due to inaccurate or drifting 
AGC. This is illustrated in Fig. 3.3.1. In this figure, the 
effect on the Pareto exponent of variations in the decoder metric 
bias as a percentage of the maximum branch metric are shown. 
Clearly, even these small changes in metric bias result in large 
changes in computation distribution. 
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3*3.4 Comparisons of Soft and Hard Decision Sequential 
Decoders . An 8- level quantized sequential decoder, with the same 
size received data buffer as a hard decision decoder, can buffer 
only 1/5 as many received branches. Also, the increased com- 
plexity of the decoder logic will mean a computation speed reduc- 
tion of about a factor of 2 under that of a hard decision decoder 
(see section 3.4). 

This along with the AGC problem, and the nonexistence of a 
"quick threshold loosening" scheme will reduce the 2 db perform- 
ance advantage, inherent in fine receiver quantization, over hard 
decisions. In fact at very high speeds, where speed factors are 
low, there may be no performance advantage to a soft decision 
decoder. For these reasons - especially the AGC sensitivity, we 
cannot recommend soft decision sequential decoding for very high 
speed communication. 

A technique is described in section 5.0, however, which 
uses a Viterbi predecoder before data enters a soft decision 
sequential decoder. This technique effectively improves the 
sequential decoder speed factor, and lowers the amount of memory 
required, and may provide a means of attaining the efficiency of 
a soft decision sequential decoder at high speeds. 

3.4 Sequential Decoder Implementation . This section consid- 
ers implementation techniques for syndrome sequential decoders 
operating at data rates from 1 to 40 Mbps. Trade-offs consid- 
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ered are choice of logic family and hard vs. soft decisions. 

A block diagram of a syndrome sequential decoder is shown in 
Fig. 3.4. The input from the channel is separated into two bit 
streams, one containing received information bits and the other 
containing the received check bits. The received information 
bits are passed through an encoder which is identical to the one 
used at the transmitter, and are exclusive-ORed with the received 
check bits, thus generating the syndrome. The syndrome is then 
stored in a random-access memory. In the case of soft decisions, 
the quality bits are stored in the memory with the syndrome. 
Meanwhile, the information bits are stored in a delay line which 
is equal in length to the total delay through the random-access 
memory and CPU. The processer reads the syndrome bits (and quality 
bits, if any) from the random-access memory and, using a modifica- 
tion of the Fano algorithm, determines a likely information error 
sequence. The decoded information error sequence is then read 
back into the random-access memory. The information error se- 
quence remains in the random- access memory until the correspond- 
ing received information bits are shifted out of the delay line. 

The information bit sequence and the information error sequence 
are then exclusive-ORed, correcting any errors present in the 
received information sequence. The resulting corrected data is 
the decoder output. 

All implementations considered would use semiconductor 
memory devices to form the main memory, TTL logic circuits for 
memory buffer, control, and syndrome generation and either TTL 
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or MECL for input and output interface circuits as required by 
the data rate. The choice of logic family for the CPU section 
is more critically dependent on the maximum data rate desired. 

The implementation study has found that the maximum compu- 
tation rates for MECL III, MECL II, and TTL are, respectively, 
100, 25, and 13 megacomputations per second. The corresponding 
data rates for speed factor 2.5 are, respectively, 40, 10, and 
5 Mbps. The relative cost factors are the same as for the 
Viterbi decoder, i.e., 18, 3, 1 for MECL III, MECL II and TTL, 
respective ly . 

The trade off study for Viterbi decoding showed that. the 
most cost effective way to obtain 40 Mbps decoding was to par- 
allel four 10 Mbps TTL decoders. Such is not the case for 
sequential decoding. 

A 40 Megabit sequential decoder could be obtained by build- 
ing one decoder with a MECL III CPU, or four decoders in parallel 
with MECL II CPU's or eight decoders in parallel with TTL CPU's. 
In the case of the MECL III decoder, CPU accounts for approxi- 
mately 60% of the total cost and the memory and I/O circuitry 
account for 40%. The memory and I/O circuitry for a MECL II 
CPU decoder would run about 85% of the cost for the MECL III de- 
coder. The memory and I/O for a TTL CPU decoder will cost about 
75% of that required for the MECL III decoder. The results for 
40 Mbps are shown in the following tables 
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CPU 

TYPE 

COST OF 
MEMORY 

COST OF 
CPU 

RELATIVE COST 
OF 1 DECODER 

NUMBER OF 
DECODERS 

RELATIVE 
TOTAL COST 

■■■■■■■■■ 






MECL 

III 

.4 

H 

1.0 

1 

1.0 

MECL 

II 

.35 

.1 

0.45 

4 

1.8 

TTL 

.3 

.03 

0.33 

8 

2.64 


Thus, the MECL III design turns out to be the most inexpensive for 
this data rate. The MECL III decoder for hard decisions is pre- 
sented in detail in the following section. 

3.4.1 40 Mbps Sequential Decoder . A detailed block diagram 

of the 40 Mbps sequentail decoder is shown in Fig. 3.4.1. The 
input from the channel is accepted in either of two forms; as in 
serial bit stream at a maximum rate of 80 Megasymbols per second, 
or as two parallel lines at a maximum rate of 40 Megasymbols/ 
second each, corresponding to a maximum data rate of 40 Megabits 
per second. 

The parallel inputs are provided for decoding QPSK modulated 
data. Received information and parity bits are routed separately 
from the modem to the decoder. 90° demodulator phase ambiguities 
are resolved by the code synchronization circuitry. 

The 80 Mbps serial input is for use with BPSK modems. In 
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this case, the interleaved received information and parity bits 
are decommutated in the decoder. Node synchronization is pro- 
vided again by the code sync circuitry. For both BPSK and QPSK, 
180° phase ambiguities will be handled by using codes trans- 
parent to 180° phase flips and differentially encoding and de- 
coding the data. 

The input circuits exclusive of the serial input decom- 
mutation are implemented using MECL II logic. The input lines 
are then buffered down to eight parallel lines operating at a 
maximum rate of 10 Megabits per second per line. The eight result- 
ing lines are then converted to TTL logic levels and delivered to 
the encoder replica and buffer. 

The syndrome is then formed from the received information 
and parity bits. The syndrome is collected into 72 bit words 
and delivered to the random-access memory. The random-access 
memory is a semiconductor memory consisting of 1,024 words of 72 
bits each. The memory has a read/write cycle time of 400 nano- 
seconds. The syndrome words are written into sequential locations 
in the random-access memory. Prior to writing the syndrome word, 
the present contents of the addressed word are read out of the 
memory. This word contains the decoded information error 
sequence. The word is sent on to the output buffer where it is 
exclusive-ORed with the delayed information bit sequence. 

The random-access memory has two ports. The input-output 
section is connected to the first port and has priority. The CPU 
buffer is connected to the second port. The CPU buffer reads a 
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72 bit word out of the random-access memory and delivers it to 
the CPU four bits at a time whenever the CPU requests new data. 
After reading out the new syndrome word, a new information error 
word is written into the memory at the same address. When the CPU 
buffer memory address catches up to the input-output address, the 
CPU buffer must wait until a new word is written into the random- 
access memory from the input-output section. When the CPU buffer 
falls so far behind the input-output section that the input-out- 
put section attempts to write over an undecoded word, a buffer 
overflow is declared. Whenever an overflow occurs, a signal is 
sent to the external equipment indicating the next word of data 
is likely to contain errors. The overflow signal is also sent to 
the Fano algorithm logic so that it may restart at a point further 
ahead in the memory. 

When the CPU section reaches the front of the back-up buffer, 
a new four bit syndrome word is requested from the memory section 
via the CPU buffer. This four bit word is exclusive-ORed into 
the encoder. The encoder is capable of shifting in either direc- 
tion. If the algorithm logic determines that the present node 
contains an information bit error, then the code impulse function 
is exclusive-ORed into the encoder. Likewise, when the decoder 
is backing up, the impulse functions that were exclusive-ORed 
into the encoder while proceeding forward must be removed. The 
bits shifted out of the right hand side of the encoder while 
moving forward correspond to the check bit error sequence. These 
bits are shifted into the back-up buffer along with the informa- 
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tion error sequence. When the decoder is backing up, the 
original syndrome is reconstituted by shifting the check error 
sequence back into the right hand side of the encoder, and by 
exclusive-ORing in the code impulse function wherever an infor- 
mation error previously was hypothesized. The syndrome bits are 
shifted out of the left hand side of the encoder when backing up 
and are shifted into the back-up buffer. 

Functionally, the back-up buffer is a right-left shift 
register that is two bits wide and 256 bits long. The back-up 
buffer is actually implemented by using a very fact ECL random- 
access memory that is addressed by up-down counter, thus giving 
the effect of a right-left shift register. The function of the 
algorithm logic is to direct the progress of the CPU through the 
decoding tree. The algorithm logic determines whether the decoder 
may proceed forward or backward and determines the changes in the 
decoder metric. 

3.4.2 Code Synchronization . The function of the synchroni- 
zation circuit is to obtain correct code sync. This is accompli- 
shed by comparing the rate of metric threshold loosenings with 
the rate of metric threshold tightenings. If the code sync state 
is correct, then the rate of metric tightenings will exceed that 
of metric loosenings. However, if the code sync state is incorrect, 
then metric loosenings will exceed the rate of metric tightenings. 
The sync circuit contains a counter which is counted up every 
time a metric threshold loosening occurs and is counted down 
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every time a metric threshold tightening occurs. The counter has 
a reflecting boundary at zero. If the counter overflows while 
counting up, a bad sync state is declared and the code sync state 
is changed. Code sync resolves the 90° phase ambiguity in the 
QPSK input case and accomplished node sync in the BPSK case. The 
counter is large enough that the probability of falsely declaring 
a bad sync state is negligible compared with decoder error rate. 
The code sync state is changed if the input is from a PSK modem 
by inserting or deleting a one symbol time delay. If the input is 
from a QPSK modem the sync state is changed by interchanging the 
two symbols and inverting one of them. Also, when the sync state 
is changed the CPU buffer address is set equal to the input-output 
address and the CPU is restarted at that point. 

3.4.3 Input Buffer . A detailed logic diagram of the input 
buffer is shown in Fig. 3.4.3. The inputs from the channel are 
each received on twisted pairs which are terminated in the charac- 
teristic impedance of the line, and then fed to a set of line 
receivers. The output of the clock line receiver is then fed to 
the clock conditioning circuit which provides all of the necessary 
clocks to the decoder. The data inputs are then fed to three 
flip-flops which format the various inputs into two parallel 
lines, one line containing the information bit stream and the 
other line containing the check bit stream. 

If the serial PSK input is used, the three flip-flops are 
connected as a three bit shift register. The information bit 
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output is taken from the first register and the check bit is 
taken from the second register in one sync state. In the other 
sync state# the information bit is taken in the second flip- 
flop and the check bit is taken from the third flip-flop, thus 
resulting in a selective insertion of a one symbol time delay as 
required to obtain node sync. 

If the input is on two parallel lines from a BPSK modem, 
the first bit is shifted into the first flip-flop and the second 
bit is shifted into the second flip-flop at each bit time. Also, 
the output of the first flip-flop is shifted into the third flip- 
flop. The information bit and check bit may now be taken from 
the three bit register as in the previous case thereby obtaining 
node sync. 

If the input is from a QPSK modem, then the first line is 
shifted into the first flip-flop. The complement of the first 
line is shifted into the third flip-flop. The second input line 
is shifted into the second flip-flop. Code sync is then obtained 
by using the same set of switches as in the previous case. 

The information and check bits are each shifted into a four 
bit shift register. Every four bit times, the contents of the 
two shift registers are loaded into two four bit latches. The 
two four bit latches each drive a quad MECL II to TTL converter. 
The resulting TTL signals are then delivered to the rest of the 
decoder. 


3.4.4 Received Information Bit Storage . Received Information 
Bits must be stored in a 72,000 bit long delay line while the 
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decoding is taking place. A block diagram of the information bit 
storage is shown in Fig. 3.4.4. This storage is accomplished by 
using dynamic MOS shift registers, since this form of storage is 
presently the most inexpensive available. Since the MOS shift 
registers used are not capable of operating at 40 Megabits, a 
number of registers must be operated in parallel to obtain this 
effective speed. 

The information bits from the input buffer are collected 
into 36 bit words, using a serial to parallel converter followed 
by a 36 bit latch. The output of the 36 bit latch drives 36 
2,048 bit long shift registers. The output of the shift registers 
are clocked into a 36 bit latch. The latch drives a parallel to 
serial converter which converts the information bits back to four 
parallel lines. Since the MOS registers are dynamic shift regis- 
ters, there is a minimum clock rate of 10 KHZ which corresponds 
to a data rate of 300 kilobits. If it is desired to provide for 
operation of data rates lower than 300 kilobits, the following 
technique could be used: Below 300 Kbps, the dynamic shift reg- 

isters are replaced by static registers of reduced length. The 
total storage is reduced by a factor of 128. The speed factor - 
buffer size product remains higher at 300 Kbps than at 40 Mbps. 

3.4.5 Syndrome Generator . A logic block diagram of the 
syndrome generator is shown in Fig. 3.4.5. The syndrome generator 
receives the information bits and check bits from the input sec- 
tion in two words of four bits each. The syndrome generator 
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computes the syndrome, four bits at a time, thereby permitting it 
to operate at the data rate divided by four. The syndrome 
generator is implemented using SN7495 shift registers and SN7486 
quad modulo two adders. Since this logic is fast enough to permit 
only one level of mod two addition between the flip-flops, all 16 
possible mod two combinations of the four parallel information 
bits must be computed. This is accomplished in two levels of 
flip-flops. In the first level, information bits one and two 
and information bits three and four are each modulo two added and 
stored in two flip-flops. In the next level, the nine remaining 
combinations are computed, thus resulting in the signals 1-1 thru 
1-15. The check bit word is delayed three clock times, so that it 
appears at the appropriate time in respect to the information 
bits. The remainder of the syndrome generator consists of four, 

12 bit long shift registers with a modulo two adder between each 
stage. One input to each mod two adder comes from the previous 
stage in the register. The other input comes from one of the 1-1 
thru 1-15 signals. The output of the next to the last stage of 
the syndrome generator is a set of computed check bits. The 
syndrome is formed in the last stage of the syndrome generator 
by exclusive-ORing the four received check bits with the four 
generated check bits. The resulting syndrome is delivered to the 
memory. 


3.4.6 Decoder Memory . The function of the decoder memory is 
to store the syndrome while it is awaiting processing by the CPU 
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section of the decoder. A block diagram of the memory is shown in 
Fig. 3.4.6. Since the number of computations required to decode 
one bit is a random variable, and the rate of computations is 
fixed, a large amount of storage is required in order to make 
the probability of buffer overflow sufficiently small. Simula- 
tions have shown that a memory size of 72,000 bits will provide 
the required performance with 40 Megabit data and a computation 
rate of about 10 8 Fano algorithm computations/second. The 
previous two quantities also set the throughput rate of the mem- 
ory, since the CPU must be able to access about 10 8 bits per 
second in order to achieve its maximum computation rate. An 
additional 40 Megabits per second must be accessed in order to 
store and retrieve the syndrome input and the information error 
sequence output; thus, the total throughput requirement of the 
memory is about 140 Megabits per second. 

At the present time, economical semiconductor random-access 
memories are available with read/write cycle times in the neigh- 
borhood of 300 to 400 nanoseconds. If a 400 nanosecond memory 
is used, then a very wide word is necessary in order to obtain 
the required throughput rate. Random-access memory cards are 
presently available from several manufacturers containing 1,024 
words of 18 bits each in the required speed range. Combining 
three of these cards to form a 54 bit word does not quite meet 
the required throughput rate, consequently four cards will be 
used, resulting in a 72 bit wide word, in order to achieve the 
desired througnput rate. 
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The memory must have two ports; one port to serve the input- 
output and the other port to serve the CPU. The input-output 
port must have priority so that data will not be lost. 

The syndrome is collected into 72 bit words for access into 
the memory. When 72 bits have been collected, the memory timing 
and control circuit is signalled and the next available cycle is 
given to the input-output. A 72 bit two input multiplexer selects 
the syndrome word for data input into the random-access memory. 
During the cycle, the word stored in the present input-output 
address is read out and loaded into a 72 bit latch. After the 
read operation, the 72 bit syndrome word is written into this 
address and the cycle is completed. The word read into the 72 
bit latch is then converted into a four bit parallel line and is 
sent to the output circuit. When the CPU requires a memory access, 
the 72 two input multiplexers and the 10 two bit address multi- 
plexers connect the CPU address counter and the CPU data lines 
to the memory and the cycle is initiated. The memory addresses 
are stored in two 10 bit counters , one for the input-output and 
one for the CPU. The memory timing and control circuit controls 
the read/write process. This circuit also generates a signal 
whenever a buffer overflow occurs. 

3.4.7 CPU Buffer . A block diagram of the CPU buffer is shown 
in Fig. 3.4.7. The function of the CPU buffer is as follows; 

When a 72 bit syndrome word is read from the memory, it is stored 
in the 72 bit latch. When the four 18 bit shift registers are 
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empty, the contents of the 72 bit latch are loaded in parallel 
into the four 18 bit shift registers. The contents of these 
registers are then shifted out to the CPU four bits at a time. 

With each shift, a four bit word is shifted into the four 18 bit 
shift registers from the CPU. Thus, after 18 shifts, the old 
word will have been shifted into the CPU and a new 18 bit word 

will have been shifted back into the shift register. At this 

time, the present contents of the four 18 bit shift registers 
are loaded in parallel into the second 72 bit latch and the con- 
tents of the first 72 bit latch, which now contains a new word, 
is then loaded into the four 18 bit registers and the process 

continues. If a new 72 bit word is not yet ready, then the CPU 

waits until a memory access can be obtained. The latches and the 
shift registers are implemented using SN7495 devices. 

3.4.8 CPU . A block diagram of the CPU is shown in Fig. 

3. 4. 8.1. The four bit words from the CPU buffer are translated 
from TTL logic levels to MECL logic levels and stored in a four 
bit latch. The contents of the four bit latch are loaded in 
parallel into the encoder every four computations when the decoder 
is proceeding forward to new nodes. A logic diagram of three 
stages of the encoder is shown in Fig. 3. 4. 8. 2. The encoder is 
capable of shifting in either direction. The code impulse 
function can be exclusive-ORed with the present contents of the 
encoder on a shift in either direction. The encoder may also be 
synchronously reset. Each stage of the encoder consists of one 
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flip-flop and one quad two input NOR gate. The output of the 
right hand side of the encoder is the hypothesized check error 
sequence. This sequence is stored in the back-up buffer. When 
the decoder is backing up, the bits that are shifted out of the 
left hand side of the encoder are stored in the back-up buffer. 

If an information bit error is hypothesized at a node, then the 
code impulse function is exclusive-ORed into the encoder, and a 
one is shifted into the four bit information error register. 

The information error sequence is also shifted into the back-up 
buffer when proceeding forward. When backing up, the information 
error sequence is returned to the four bit register and the ori- 
ginal syndrome is reconstituted in the encoder by shifting in the 
check error sequence and by exclusive-ORing the contents of the 
encoder with the code impulse function at every node at which an 
information error was hypothesized. The information error 
sequence that is shifted out of the back-up buffer when proceeding 
forward is shifted into a four bit right/left shift register. 

Every four shifts forward to new nodes, the contents of this 
register are loaded into the four bit latch. The contents of the 
latch are then translated from MECL logic levels to TTL logic 
levels, and returned to the CPU buffer. The function of the 
back-up buffer is that of a 256 bit long, two bit wide, right/ 
left shift register. Since this size register would be prohib- 
itively expensive if implemented using MECL III logic, the 
function is actually implemented by the use of a very fast ECL 
random-access memory. A random-access memory can be made to look 
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like a right/left shift register if the random-access memory is 
addressed by an up-down counter, which counts up when a right 
shift is desired and counts down when a left shift is desired. 

The RAM is a 64 word, eight bits per word, ten nanosecond access 
time memory that is manufactured by Advanced Memory Systems. The 
RAM is interfaced to the CPU through the two, four bit, right/ 
left, parallel load registers and the two, four bit, buffer 
registers. A six bit up-down counter provides the address. 

The algorithm logic controls the progress of the decoder 
through the decoding tree. Each computation the algorithm logic 
decides, 1) whether to move forward, backward, or sideways, 2) 
whether the node being computed has an information bit error, 

3) whether to raise or lower the threshold, and 4) the amount by 
which the decoding metric should change. Two infrequently used 
functions are : 

1) back-up buffer overflow, and 

2) main buffer overflow 

When back-up overflow occurs, the decoder has come to the end 
of the back-up buffer and can go no further in a back search. In 
this situation, the algorithm logic lowers the threshold and goes 
back to forward searching. In the case of a main buffer overflow, 
the decoder must attempt to restart at some point ahead of its 
present location. The decoder jumps to the front of the back-up 
buffer, requests a new word from the main buffer, resets the en- 
coder, resets the metric register, and attempts to begin decoding 
in the forward direction. If the decoder is unable to restart, 
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the main buffer will soon overflow again and another restart 
attempt will be made. 

3.4.9 Physical Description . The CPU is implemented using 
MECL III logic. Approximately 250 MECL III IC's are required. 

The CPU can be packaged on one or two multilayer circuit boards. 
Five or six layers will be required. Two layers are reserved for 
ground and power planes. The remaining layers are required to 
provide the necessary interconnections. All circuit board runs 
longer than 2 inches are terminated strip transmission lines of 
50 ohm impedance. 

The MECL III stud mounted flat package is used. The packages 
are mounted with the stud passing through the circuit board. 
Cooling can be provided by soldering a U shaped fin to the stud 
on the side of the board opposite the component. Cooling air 
will then be blown across the fins. Using this technique, the 
temperature rise in the equipment will be less than 20 degrees. 
With 50°C ambient temperature, the maximum component temperature 
will then be 70°C. 

The remainder of the decoder is implemented using MECL II, 

TTL and MOS logic devices. Approximately 325 of these devices 
are required. They can be mounted on five or six circuit boards, 
using the welded stitch wire technique. The main buffer can be 
packaged on four additional circuit boards, and the back-up 
buffer can be packaged on one circuit board. 

The complete decoder, together with power supplies, can be 
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packaged in a standard 10^" high rack mountable chassis. All 
data input and output connectors can be mounted on the front panel 
together with all necessary controls. 

3.4.10 Modifications Required by Soft Decisions . As 
explained in previous sections, an improvement in E^/Nq of 1.5 db 
is available by increasing Q from 2 to 4. But the cost is a sub- 
stantial increase in the hardware requirement. 

A soft decision syndrome sequential decoder would form the 
syndrome as in the hard decision case by regarding the most 
significant bit of each symbol as a hard decision. This is most 
easily done if the quantizer levels are labelled by a sign- 
magnitude representation. The sign bit then corresponds to a 
hard decision. 

The syndrome is stored in the main memory along with the 
magnitude (or quality) bits for each symbol. This requirement 
triples the size of the main memory for Q = 4 since a syndrome bit, 
and one quality bit for each symbol must be stored for each node. 
The hard decision bit for the information bits is stored in a 
'delay line as in the case of the hard decision case. This re- 
quirement causes the hardware in the memory and I/O section of 
the decoder to increase by a factor of 2.5. 

The CPU uses the syndrome and quality bits to determine a 
likely information error sequence. This sequence is then stored 
in the main memory for later use in correcting the delayed infor- 
mation bit stream. All bit paths in the CPU must now be three 
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times wider than for a hard decision decoder, thus, tripling this 
portion of the CPU hardware. The algorithm logic becomes consid- 
erably more complicated since the metrics must be quantized to a 
much finer level than in the hard decision decoder. This results 
not only in an increased number of gates, but in an increased 
number of logic levels, thus slowing down the computation rate. 

The computation rate will be reduced from about 100 Megacompu- 
tations per second in the hard decision case to about 70 Mega- 
computations for Q s 4. The CPU hardware will increase by a 
factor of three. Thus the overall increase in hardware (and in 
cost) will be a factor of 2.8 relative to a hard decision decoder. 
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4 . 0 CODING FOR DATA OF VARYING SPEED AND ERROR RATE REQUIREMENTS 


Consideration has been given to the problem of transmitting 
a data stream on which, by time-division multiplexing, data from 
sources with different data rates and error requirements have been 
combined. Three approaches have been considered and analytical 
results obtained for each. These are designed to protect the 
high-reliability low-rate data by one of the following methods : 

a) concatenated coding (originally outlined in section 4 
of the proposal for this contract.) 

b) lengthened symbol times for the low-rate data 

c) use of lower rate codes for the low-rate data 

To obtain specific results, it was assumed that the ratio of 
high-rate to low-rate is no greater than 10 and that the required 
high-rate bit error probability is 10" 3 , which probably represents 
a worst case. 

Also, the basic code for the high-rate data was taken to be 
the best rate 1/2 constraint length 4 convolutional code. While 
this is used primarily because it has been thoroughly analyzed 
and simulated, it is also a reasonable candidate for data rates 
above 10 Mbps with Viterbi decoding. 

4.1 Concatenated Coding . The goal is to find a very simple 
outer code to be used on the low data-rate source only, which will 
decrease the error probability to the desired level. The overall 
coding system is shown in Fig. 4.1. If the outer code has rate 
1/2, the overall E, /N is increased only by 0.4 db. 
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In order to render the inner code errors nearly independent, 
interleaving must be introduced. Five constraint lengths of 
interleaving seem more than sufficient. Thus, an interleaving 
memory of 20 bits is all that is required for constraint length 4. 

The outer code then operates essentially on a binary-symmet- 
ric channel with crossover probability p=10“ 3 . A two-error 
correcting BCH code is unsatisfactory because it requires R=7/15 
and only achieves 

p ° : h fr) p ' = 2 - 1 * 

A more satisfactory block code, the Golay (24, 12) three-error 
correcting code has R=l/2 and 

Pb ' ? (It) (r) p " - 1771p " = 1 - 7 x 10 ‘ 

However, this requires a moderately complex decoder when operating 
at high data rates. Block code synchronization may add to the 
complexity if not already provided by a separate framing reference 
A more suitable approach is to use a convolutional outer code 
The two-error correcting, rate 1/2 code (Fig. 4.2a) yields for the 
BSC 

P B = 50p 3 = 5 x 10“ 8 

while the best three-error-correcting rate 1/2 code yields (Fig. 
4.2b) 
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P B ~ 560p 4 = 5.6 x 10" 


1 o 

In each case, the results are better than the corresponding block 
codes, the decoder is simpler, and code synchronization is not 
required for the convolutional code. Node synchronization, which 
is required, can be obtained using the methods outlined in 
section 2.2.5. 

While Viterbi decoding is required to obtain the above 
results, a feedback decoder, the L-1011 presently marketed by 
LINKABIT Corporation, obtains nearly equivalent results for the 
BSC, and affords a much simpler mechanization for both the two- 
and three-error correcting decoders. 

4.2 Lengthened Symbol Times for Low- Rate Data . This is 

undoubtedly the simplest approach. If a rate 1/2 outer code were 

used for the low-rate data, its effective E, /N would be increased 

D o 

by 3 db (resulting in an overall increase of 0.4 db) . Rather 
than using a code, the low rate symbols might simply be repeated 
or lengthened. To achieve P B =10” 3 with a K=4 code for the high- 
rate data, the required E^/Ng = 3.75 db as established by analysis 
and simulation. Thus, using this approach for the low-rate 
data, we would have Efc/No = 6.75 db. For high E^/No, the bit 
error probability (without quantization) for this code is asymp- 
totically 

P D ~ 2 erfc / 6E,/N 0 
B a 1.1 x lO" 7 ° 

This figure may be slightly optimistic if 8-level quantization is 
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used 


4.3 Lower Rate Codes . Instead of doubling the symbol length, 
we may keep this fixed but double the humber of symbols, thus 
changing from an R=l/2 code for the high-rate data to an R=l/4 
code for the low-rate data. Preliminary analysis indicates that 
the rate 1/4 constraint length 4 code with minimum bit error 
probability at high E b /N 0 has the generator matrix 

G = 

and that for this code at E b /N 0 = 6.75 db 

P fi ~ 4 erfc /?.'?E b /N 0 t 6.5 x 10~ 8 

If such a lower-rate code is used for the low-rate data, 
separate decoders must be used for the high-and low-rate data. 

The two decoders may time-share some subsystems, such as the 
arithmetic unit, but in some respects, they must be distinct. 

In view of the very modest improvement, it is questionable wheth- 
er the complexity is warranted relative to the approach of 
doubling the symbol time, which requires virtually no additional 
decoding complexity. 

On the other hand, it also appears that concatenation with 
a simple convolutional outer code gains about two orders of 
magnitude in performance over the other two approaches. 
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Figure 4.1. 


Coding for data of varying error-rate 
requirements. 



(a) two-error correcting 



(b) three-error correcting 
Figure 4.2. Outer Convolutional Code 
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5.0 PREDECODING FOR A SEQUENTIAL DECODER: 

A HYBRID IMPLEMENTATION FOR VERY LOW ERROR PROBABILITY . 

5.1 Introduction . To achieve very low error rates with a 
sequential decoder at very high data speeds, it is necessary to 
operate well below R com p* For at high speeds, even a speed fac- 
tor of 2.5 may be prohibitively expensive, especially in a soft 
quantized sequential decoder. Furthermore, for very low sequen- 
tial overflow probability, large quantities of storage are required, 
particularly with 8-level quantization which requires six bits 
of storage per branch. 

On the other hand, if we operate well below R , most of 
the data can be correctly decoded by a short constraint length 
Viterbi decoder. As was pointed out in the original description 
of the algorithm (Ref. 1), a Viterbi decoder can decode a long 
constraint length (K) convolutional code treating it as if the 
constraint length were much shorter (k<<K), by operating only on 
the first k symbols of the convolutional code generators. Of course, 
when an error occurs, remerging to the correct path is extremely 
unlikely and generally all subsequent bits will be decoded incor- 
rectly. The point is, however, that most of the data can be 
correctly decoded in this way and only the more difficult (noisy) 
segments of data are incorrectly decoded, and if errors can be 
detected, then segments can be passed on to a more powerful sequen- 
tial decoder. 


*This technique was proposed by G. D. Forney, Jr., who was a 
consultant on this study. 
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The approach that is therefore suggested is to use a short 


constraint length Viterbi decoder (k*5) to " predecode " a long 
constraint length convolutional code, detecting the incorrectly 
decoded segments, and passing these on to a seguential decoder, 
which is more powerful since it utilizes the full constraint 
length (K>30) of the code. The mechanization block diagram is 
shown in Figure 5.1. 

We assume that data is encoded, in frames of 1000 bits, 
into a constraint length K=40, rate 1/2 nonsystematic convolu- 
tional code, each frame being followed by a tail of 39 known 
branches, to be used for resynchronization. The received demod- 
ulated data (soft or hard quantized) is passed first to a Viterbi 
predecoder operating as a decoder for a k=5 code corresponding 
to the first 5 symbols of the generator sequences. All decoded 
data is passed to a long digital delay line capable 6f storing 
D=256 decoded frames (256 K bits) . 

All undecoded received data from a given frame is also passed 

•k • 

to a one-frame buffer (6000 bits for 8-level soft quantized data ' 
1000 syndrome bits for hard quantized data) . The Viterbi decoder 
output is also monitored in an effort to detect all frames in 
which an error occurred. This can be performed in a number of ways. 

The probability that an incorrect path remerges with the correct 

— K 

path at any given node is of the order of 2 . Hence, with K=40, 


*This can be reduced to 5000 bits if we use a soft decision syn- 
drome decoder, as disaussed in Section 3.3. 
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Fig. 5.1 Hybrid Implementation of Sequential 
Decoder and Viterbi Predecoder. 











it will remerge at any one of 10 3 nodes with probability on the 
order of 10“ 9 . By merely observing K-k-l=34, decoded tail branches 
and comparing them with their known values, the probability that 
an error will not be detected is much less than 10“®. 

When an erroneous frame is detected, it is tagged for future 
reference and gated to a large B frame buffer to await processing 
by the sequential decoder. This buffer is divided into two segments. 
In the front portion is stored data from frames awaiting sequential 
decoding. In the rear portion are the already sequentially decoded 
frames awaiting insertion into the decoded data stream as it exits 
from the delay line. In all, there is storage available for B 
undecoded and B decoded frames. 

There remain the problems of determining a) the percentage 
of deletions, which establishes the sequential decoder load and 
buffer size requirements, b) the computational complexity of 
sequential decoding of the erroneously predecoded frames, and 
c) the overflow probabilities due to finite delay D and finite 
buffer size B. 

We shall consider these three problems in the state order. 

5.2 Deletion Probabilities of Predecoder . Deletions corres- 
pond to first event errors in a short constraint length convolu- 
tional code decoded by a Viterbi decoder. With soft quantization 
(Q=8) , a rate 1/2 k=5 code was simulated at an Ek/N 0 =4.5 db. Out 
of 1500 frames of 852 bits each, 32 were erased resulting in an 
estimated erasure probability p^ = ,02. Extrapolating to 1000 bit 
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frames, we estimate conservatively a deletion probability no 

greater than p^=.025. On the other hand, with E b /N 0 =4 db, 

the deletion probability exceeds 8 percent. 

With hard quantization, simulations were run at a crossover 

probability p*.025 which corresponds to E b /N 0 =5.7 db. The result 

of decoding approximately 5000 frames of 500 bits each was a 

deletion probability p, 5 .10. A particularly short trellis 

9 

memory (7 branches) was employed in order to make a resulting 

implementation particularly simple and inexpensive. In the 

following examples, we shall consider operating a hard quantized 

system at Ej^/Nq = 6.2 where p=.02 and the Pareto exponent is 

2.0. However, since we use longer frames (1000 bits), we shall 

use an estimated deletion rate p. = .10 even at this higher 

9 

E b /N 0 value. 

5.3 Computational Complexity of Sequential Decoding for 
Erroneously Predecoded Deleted Frames . Extensive simulations 
were performed on hard quantized data only. First 14,000 
frames with crossover probability p=.025 were sequentially 
decoded using the Fano algorithm with quick threshold loosening. 
The resulting distribution of computations was very closely 
approximated by the Pareto distribution with exponent p=1.67, 
which follows precisely from the theoretical result 

E o ( P) 

= P 

R 
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p=. 025 


=0.83 


since R=l/2 


and E (1.67) 
o 


Then with p.025, the 466 erroneously predecoded frames 
(approximately 10% of the total — see previous section) , were 
sequentially decoded. The distribution of computations was 
in all cases above the previous one, and at the high end it 
approached between 8 and 10 times the ordinary distribution. 

In retrospect, this is exactly as expected. Predecoding will 
correctly decode all the easier cases but generally fail on 
most frames which require longer computation searches in 
sequential decoding. Now suppose that it fails on all the long 
computation searches (say, above 1000 computations/bit) and suc- 
ceeds whenever such long searches are not present in a frame. 

Then the incorrectly decoded frames will contain all of these 
long searches, and since sample size is reduced by a factor of 
10, the probability distribution of long computations is raised 
by a factor of 10. 

In any case, we shall be upper bounding the decoding com- 
plexity for the incorrectly decoded frames if we use the ordinary 
Pareto distribution divided by p , the frame deletion (or incor- 
rect predecoding) probability. 

Also of importance is the fact that for ordinary sequential 
decoding, the average number of computations per bit was 1.245, while 
for the 10% of the frames which were incorrectly predecoded, this 
rose to only 1.89 computation per bit. Thus, even though the 
tail of the distribution rose by a factor of 10, the average 
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computation effort rose by only a factor of 1.5 


5.4 Overflow Probability of Hybrid Implementation . Over- 
flow with resulting deletion or errors can occur in either of 
two ways: 

a) A search is so long that an incorrectly predecoded 
(deleted) frame reaches the end of delay line D prior 
to the completion of its processing by the sequential 
decoder ; 

b) While a long search is proceeding on a given frame, 
the sequential decoder buffer fills up with B frames 
awaiting processing and cannot accept any new deleted 
frames. 

We assume that normally the buffer is neatly empty so that 
successive overflows are essentially independent. This is justi- 
fied for sufficiently large Pareto exponents (p>2), which we 
shall insure. Then for an initially empty buffer, the probability 
of overflow for any given frame is Pg = p^ [Pr (overflow on a 
deleted frame)] 


Po 



( more than B 
deleted frames 
out of K frames 


sequential decoding 
lasts K frame times 


/sequential decodina of 
\frame lasts K frame times 



P* Pr 


( sequential decoding of given' 
frame lasts D frame times 
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where p^ is the probability of deleting (incorrectly predecoding) 

* 

a frame. It is shown in Appendix C that 

exp (Bp /2) 10 3(1_p) 10 3(1 " p) 

P 0 < 1 + — (5.1) 

Bp^ (pB/p^) p ( U D) P 

provided p^ < B/2D, 
where p^ = deletion probability 
B ■ buffer size in frames 
D = delay line size in frames 

10 3 = frame length in bits 

p = Pareto exponent 

p = speed factor 

Thus, it appears that the buffer size is effectively increased 
by a factor of 1/p^ • 

5.5 System Analysis of Possible Hybrid Implementation . We 
now analyze both soft quantized and hard quantized hybrid systems, 
and compare each with ordinary sequential decoders. Our goal 
will be to achieve an error probability on the order of 10“ 8 . In 
order to establish >a basis for comparison, we assume that in each 
case we have available a digital delay line of length 256 K bits 
and a sequential decoder buffer capable of storing 64 K bits. 

Also we assume that the data speed is so high that y=2.5 computa- 


*Note, a correctly predecoded frame naturally cannot overflow 
since it is not processed by the sequential decoder. hence, the 
overall overflow probability is p times the overflow probability 
for deleted frames. In particular, if we require Po ; 10“ 8 with 
a p S0.1, the overflow probability for deleted frames must only 
be no greater than 10“7. 
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tions per bit is the highest speed factor feasible. 

Thus in each case D=256. For a soft quantized (Q=8) system 
B=64/6 = 10.7 (since each branch requires 6 bits of storage), 
while for a hard quantized (Q= 2) system, B=64 (since only one 
syndrome bit needs to be stored per branch, and either parity 
bit is inserted in the delay line for the deleted frames) . 

We consider first the requirements of ordinary (non-hybrid) 
sequential decoding. For soft quantization, we assume a K=25 
rate 1/2 nonsystematic code blocked in 1000 bit frames separated by 
24 branch blocks of known symbols. Then the overflow probability 
(error probability if deletions are treated as errors) is 

10 3 

P 0 S ~ 

(10 3 By) p 

with B=10.7 and y= 2.5, it is clear that we must have p=2.5 
for Pq < 10~ 8 . This corresponds to E^/Nq : 4.7 db. An addi- 
tional 0.1 db loss results from the resynchronizing sequence 

of 24 bits for every 1000 data bits. The undetected error 

-kR /R 

probability is approximately P E < 10 3 2 com P Since R comp 

corresponds to E^/Nq : 2.6 db and K=25, we find P E < 10“ 9 . 

Thus, the error rate due to both deletions and undetected errors 
is less than 10~ 8 . 

For the hard quantized ordinary sequential decoder, we take 
the automatic resynchronization implementation described in 
Section 3. 2. 2.1 which does not require framing. We take either 
a K=25 nonsystematic code or a K=45 systematic code. Here B=64 
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since we store only the syndrome, using the delay line with D=64 
to store the information bit (for a systematic code) or a parity 
bit (for a nonsystematic code). Then as is shown in Section 3.2, 
the bit error probability due to overflows is roughly 

„ 200 
P < — 

B " (10 3 Bp) p 

If p=2.5 and B.64, it is clear that with p=2, we have P D < 8xl0” 9 . 
This corresponds to E b /N 0 = 6.2 db. The undetected error prob- 
ability is of the same order of magnitude. 

We now turn to hybrid implementations, beginning with soft 
quantization. Having fixed the buffer and delay line sizes, the 
only parameters to be varied are speed factor p and E b /N 0 , which 
establishes the Pareto exponent, p. Clearly to minimize cost 
of implementation, we should try to minimize p. On the basis 
of average number of computations, it would appear that p could 
be reduced almost in proportion to p , the deletion rate, since 
only a fraction p^ of the frames must be processed by the se- 
quential decoder and the average number of computations only 
rises slightly for these frames. 

However, it appears from (5.1) that the minimum value of p 

is limited by the magnitude of p , , B, D. We indicated in Section 

9 

5.2 that with Q=8 and E b /N 0 = 4.5 db, the deletion rate p^=»025 
and that lower Eb/N 0 results in greatly increased deletion rates. 
Also sequential decoding with Q=8 and Ej-,/No= 4.5 db results in a 
Pareto exponent p=2.2 (Ref. 2). Then with D=256, B=10.7, p=2.2, 
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it is possible to make y=0.4 and achieve P 0 = 10” 8 . With K=40, 
the undetected error probability is well below this. Thus, we 
have reduced the speed factor by 6. If we made E t) /N 0 = 4.7 db 
with a corresponding p=2.5 (as for the ordinary sequential decoder 
above, we could make y=0. 1 with a corresponding saving of a fac- 
tor of 24 in speed) . 

For hard quantized hybrid decoding, we have D=256, B=64. 

If we take p=2 corresponding to crossover p=.02 and Ej 3 /N 0 =6.2 db, 

we have estimated in Section 5.2 that p. : .10. Then it appears 

v 

from equation (5.1) that we can make y no less than 1.25 with a 
resulting P 0 = 10“ 8 . In every case, K=40 requires an additional 
0.2 db for resynchronization. These results are summarized in 
Table 5.1. 


5.6 Conclusions . With soft quantization (Q=8) , we have 
fourtd that the principal advantage of hybrid decoding is the 
saving of a factor of 6 in speed factor and 0.2 db in E^/Ng or 
a factor of 24 in speed with no gain in E^/Ng. This is due 
primarily to the fact that for undeleted frames only 1 bit/branch 
storage is required rather than 6 bits/branch. For a soft 
quantized sequential decoder operating on data transmitted at 
speeds of 20 Mbps, such a saving is crucial for feasibility and 
cost, since soft quantized metric computations need be made only 
at 8 MHz (for y = .4) or 2 MHz (for y = .l) speeds rather than 50 
MHz (for y =2 . 5) , as would be required with ordinary sequential 
decoding. 
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SUMMARY OF HYBRID AND SEQUENTIAL DECODER IMPLEMENTATION FOR P =10“ 
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On the other hand, with hard quantization (Q=2) , only 

the moderate saving of a factor of 2 in computation speed is 

achieved, primarily because the storage, saving is not nearly 

as great. The only advantage is that afforded by quadrupling 

delay line size. Further increase of D will improve matters, 

of course. For example, we might increase D to 512 frames 

(512 K bits) and thereby achieve a reduction of speed factor 

to y*0.6 (an overall reduction of a factor of 4 in speed). 

Further significant reduction of y for hard quantized data 

is not feasible, since the deletion rate is p. « 0.1 but the 

9 

average computation rate for sequential decoding of deleted 
frames is 1.5 times the average for all frames; thus, the speed 
can certainly not be reduced by more than a factor of 6 with- 
out degrading performance. (For soft quantization, the much 

lower deletion rate p, = 0.025 made a much greater speed reduc- 

9 

tion possible.) 

Hybrid decoding increases complexity in three ways. It 
requires; 

a) A predecoder operating at the data speed, and frame 
error detection equipment. 

b) A long delay line is required (the cost of such serial 
storage represents a small increment) . 

c) Blocking of data and reinsertion synchronization. 

For a soft quantized sequential decoder, blocking of data 
is probably required in any case, and the price of (a) and (b) 
above is small indeed for a speed factor reduction of an order 
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of magnitude which it gains. In fact, at data speeds above 
20 Mbps, this may be the only way to achieve 10“ 8 error rates 
with the E^/Nq advantage of soft quantized data. 

On the other hand, with a hard quantized sequential decoder 
blocking of data is not required. The moderate speed factor 
advantage may not be sufficient to justify the costs of the 
hybrid system. 
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APPENDIX A 


Computational Technique for Evaluation of 
Convolutional Code Performance 


The calculation of a convolutional code transfer function 
essentially involves the inversion of the code transfer matrix. 

For the K=3, rate 1/2 code shown in Fig. 2.2.1 we have the follow- 
ing linear relations among nodes, or states of the diagram: 


Xl 0 


0 

0 

N 


Xl o 

Xi 1 

— 

ND 

ND 

0 

• 

Xl 1 

Xo 1_ 


D 

D 

0 


_x° 1 _ 


and T(N,D) = D 2 Xo 1 (N, D) 


(A. 1 ) 


Thus denoting the state column vector, x, the transfer matrix A, 
and the column vector (1,00...0) * 1, we find that in general * we 
must solve 

6 . 

[I - A] x = ND 1 (1) 


*In this case <5^=2; in general the best binary code for any K and 
R will have all l’s in the first branch, corresponding to 6 i =l/R. 
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where 6^ is the weight of the initial branch. The magnitude 
of the eigenvalues of the matrix A are all less than unity, 
for otherwise the code can be shown to be catastrophic. Con- 
sequently, the inverse of I-A exists and 
<5 . 

X = ND 1 [I-A]" 1 1 


= ND 


[I+A + A + A 2 + ... +A n + ...] 1 


(A. 2) 


Finally it follows from eq. A. 3 that the transfer function 


is the scalar bilinear form 
T (N, D) = ND 


( 6 ,+ 6 f ) 

1 1 ' [I+A + A 2 + ... + A + ...] 1 


(A. 3) 


where 1 ' = (0,0, 
final branch. 

In general, 


..1) is a row vector and 5^ is the weight of the 


the normalized truncation error is bounded by 

i i » i i n 


ND 


(6 i+ 6 f ) 


< Y, l|A|! k = 

k^n 1- 


The norm of a matrix is just the magnitude of its largest eigen- 
value, which must be less than unity as noted above. Obviously, 
for a given n this is a decreasing function of D. 

The first-event error probability of eq. 2.2.5 is obtained 
directly from A. 3 by setting N=1 and D at the channel parameter 
values desired. The bit error probability of eq. 2.2.6 is 
obtained by numerically differentiating eq. A. 3 at N=l; that 
is, by approximating the derivative by 
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dT(N,D) 

dn 


N=1 


(A. 4) 


, T (1+e , D) • T (1 , D) 

G 

Since T(N,D) is a polynomial in N with positive coefficients, 
the second derivative is always positive. Consequently, taking 
g > 0 , yields an upper bound on the derivative while c<0 yields 
a lower bound. 

In general, multiplying matrices of dimension 2 is a 
very lengthy numerical procedure. But here we note that actually 
all that is required is successive multiplication of matrices 
by a vector, since the nth (vector) term of eq. A. 2 is obtained 
from the (n-l)st by 

.n. . .n-1, 

A 1 = A* A 1 

Also, the matrix A has at most, 2 nonzero entries per row an in 
all 2 -3 entries. Thus the total number of multiplications re- 
quired in computing the first n terms of eq. A. 2 and A. 3 is less 
than n2 . Thus even a K=10 code at high channel noise level, 
which may require n=100 for accuracy, can be evaluated in seconds. 

We note finally that the distance properties of a given code 
can be evaluated using eq. A. 3 independent of the channel charac- 
teristics. For, setting N=1 and D=10 ^ in eq. A. 3, where d is 
an integer, we obtain 

-d6 , 

T (D) = 10 M (k^ + k v+1 10" Q + k y+2 10~ /a + ...) (A. 5) 
where k is the number of paths of minimum weight <5 , k , . is 

p r v p+1 
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I 


the number of paths of weight 5^ +1 , etc. Thus provided k^+Y<10 d 
for y ■ 1, 2 , . .., the first few nonzero digits will determine 
the integer k^ + ^, etc. The number 6f terms which can be correctly 
determined in this way depends on the word size of the computer. 

It may also be possible to determine higher order terms by sub- 
tracting the effect of already determined lower order terms and 
renormalizing . 

The total number of bits in error in the union of all erro- 
neous paths at a given distance from the correct path can similarly 
be obtained by first differentiating as in eq. A. 4 and then 
applying the procedure of eq. A. 5. The coefficient with subscript 
p+Y now denotes the total number of bit errors in all paths at 
distance 6 ^+y from the correct path. 
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Effect of Memory Truncation 


This technique can be extended to determine the effect of 
memory truncation on the first-event error probability. Suppose 
path memory is truncated n branches prior to the last received 
branch and that a maximum likelihood decision is made to determine 
the output bit at that point. In this case, an error occurs 
whenever the likelihood function of any state (or node in the 
state diagram) exceeds that of the all zeros state, assuming 
this was the correct path. The probability of this event is 
union bounded by 

T (D) = D y A 11 ” 1 1 


where u = 1111... 1 (row vector) 
and 1 = 100... 0 (column vector) 
and A is a function of D only. 

6 . 

Note that the first branch is accounted for by D 1 . we note 
that truncation eliminates the ordinary errors due to remerging 
paths beyond the nth. Thus a union bound on the overall first- 
event error probability is 


P E < T(D) + T fc (D) 
n-2 


( 6 



Jo f y 

jl + A + A 2 + . . . A n " 2 

1 + uA n_1 lJ 


For the Gaussian channel, this may be refined by multiplying this 

-5 

expression by erfc(/<s re /N n ) D U as in eq. 2.2.5. 

y b u 
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APPENDIX B 


A variety of communication systems can be mechanized to 
include an output variable which is bimodal, with a large mean 
when system operation is unreliable and a small mean when it 
is reliable. An important source of possible unreliability is 
in the system synchronization. The required variable to deter- 
mine unsynchronized operation is the estimate of the number of 
errors made, which in a coded system may be the modulo-2 sum 
of the received sequence and the nearest possible codeword. 

For synchronized operation, the expected relative frequency of 
a one, p g , is just the channel probability of error, while for 
unsynchronized operation, it will be considerably higher, 

p >>p . We shall assume that successive symbols of this observed 
u 1 s 

* 

sequence are independent. 

Detection of unreliable operation then proceeds as follows. 
After each bit (event) time, a one is subtracted from a counter 
if the bit was zero. If the bit was a one, an integer k-l(k>l) 
is added to the counter. Only non-negative summands are stored; 
if the total sum ever becomes negative, it is reset to zero. 

Thus , we have a reflecting boundary at the origin . Whenever the 
count reaches a threshold N, we detect unreliable operation. Thus 
there is an absorbing boundary at N . By making kp s <l and kp u >l, 
it follows that the expected drift is kp^-1 > 0 (to the right) 

*This is clearly true for synchronized operation if channel errors 
are independent; for unsynchronized operation, dependencies will 
actually improve operation. 
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when the system is unreliable (unsynchronized) , while it is 

kp -1 < 0 when it is reliable (synchronized) . 
s 

As is standard, we define a false alarm as the event that 
reliable operation causes a threshold crossing, and detection 
as the event that unreliable operation causes a threshold 
crossing. Of interest are the first passage time statistics 
in both cases, and in particular, the first and second moments; 
i.e., mean time to false alarm and detection and the corres- 
ponding variances. 
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Exact Analysis 


For independent events, the Markov sequence random walk is 


completely characterized statistically by the transition matrix, 


where p equals p or p corresponding to either system mode 

S u 

(Ref. 1). 


P 



A 

i 


I 


N+l 


V 


We number the rows and columns from 0 to N, corresponding to 
the states (contents) of the counter. The term indicates 

the probability of a transition from state i to state j. We 

* 

assume as a worst case that the counter always starts in state 0. 


*This will be the case in the synchronized mode if we take our 
time origin as the instant of initial synchronization. For the 
unsynchronized mode, the initial state will be the state of the 
counter when synchronization is lost, but the first passage 
time to detection will then be upper bounded by taking this to 
be 0. 


- 220 - 






Then the probability distribution of state occupancy 
(ir^, after one transition (bit time) is 

; (1 ) = (1, 0, 0,...0) P 


(B.l) 


and after n transitions it is 


ir (n) m ^ (n-1) p 0 (1> Q...0) P n 


(B. 2) 


Now we are interested in the distribution of time to first 
arrival at state N. Clearly, the Nth component of ; ^ 


^ n) = Pr(v N - n) 


(B . 3 ) 


where v N is the time of first arrival (passage) at the threshold 
N. Thus equations B.l and B.2 yield the desired distribution. 
From this, we can obtain the mean and variance of first passage 
time by 


«<v -t - tr -•r i> ] - tK n) ] 

n=l n=0 


(B. 4 ) 



(B . 5 ) 


The numerical algorithm which generates (3) simply post-multiplies 
(1, 0...0) by P n times successively. Each time it selects the 
last term of the resulting vector, which is the distribution 
tt^ . Also, it augments an accumulator to form the mean of 
equation B.4 and also forms the weighted function of equation 
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B.5 to obtain the variance. 

While equations B.3, B.4, and B.5 yield all the results 
desired, the mean time and hence the mode of the distribution 
will tend to be very large for the false alarm and hence the 
total number of iterations required for a meaningful result 
may be several million, thus rendering this direct approach 
impractical. 
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Asymptotic Analysis 


When the ratio of the threshold N to the maximum step size 
k-1 is very large, we may model this process as a continuous ran- 
dom walk. For this purpose, let us consider spatial parameters 
and continuous time, each step taking At seconds and being either 
-Ax or + (k-1) Ax. Thus, the threshold becomes 

a = NAx (B.6) 

the mean drift per unit time 

m = (kp-1) (Ax/At) (B . 7 ) 

and the drift variance per unit time 

g 2 = k 2 p (1-p) { Ax) 2 /At (B . 3 ) 

The moment-generating function for the first passage time 
of this continuous random walk satisfies the equation (Ref. 2) 

d 2 fjx,Xl + m dfU . , , j) , = X f (x , X ) (B. 9) 

2 dx 2 dx 


where f(x,X) = E je *f(x,t)J =J' e ^ fc f(x,t)dt 
and f(x,t) = d/dt |pr ( v fc < t | starting at x)j 

From this, we can also obtain equations for the mean first passage 

time E(v, | starting at x) = : 

r X=0 


a 2 d 2 ti (x) + m dt i 

2 dx 2 dx 
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= -1 


(B. 10) 



starting at x) 


and more generally for the jth moment t.(x) = E(v^ 

3 i- 

cr 2 d 2 t . (x) 

JL + m dt . (x) = -jt. . (X) (B. 11) 

2 dx 2 3 3-1 


j = 2, 3, 


Thus each moment can be obtained by iterating on the solution for 
the next lower equation. Of course, we could also obtain the 
negative jth moment by differentiating the moment generating func- 
tion f(x,A) and setting A=0. 

The boundary conditions associated with equations B.10 and 
B. 11 are obtained as follows. Since the threshold (absorbing 
boundary) is at x=a, and since the dependent variable x indicates 
the starting point, 

tj (a) = 0 j = 1, 2, (B. 12a) 

Also, whenever x becomes negative, it is automatically returned 
to zero, giving rise to the boundary condition, 


dt j (x) 
dx 


x=a 


= 0 


j * lr 2, 


(B.12b) 


For the moment generating function 


t. (x) 


*(*.») = t 

3=0 3 


(-A) 3 


since t^ (0) = 1, we obtain the boundary conditions 

f (a, A ) = 1 


df (x, A ) 
dx 


x=a 


= 0 
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The solution of B.10 for the mean first passage time with 


the boundary conditions (B.12a, b) is 


tj (x) = 


e - 2 m a/a 2 _ e - 2 m x/a 2 
2m 2 /a 2 


a-x 

m 


Insertion of this solution into B.ll for j=2, yields the second 
moment t 2 (x) . Of interest are, 


E(v fc | starting at 0) = tj (0) 


1 

m 


e - 2 m a/a 2 
2m/ a 2 


(B . 15) 


and 

Var(v fc | starting at 0) ■ t 2 (0) - t?(0) 

= £ ^(l+2e- 2am /° 2 ) + (e - 2 am/a 2 _ 1} (e - 2 am/a 2 +5) a 2 /4m ] 

(B . 16 ) 


It is also possible to solve for the entire moment-generating 
function by solving equation (9) with the boundary conditions 
(B.13a, b) . The result evaluated at x=0 is 


f (0,X) 


E 



starting at 


0 


. g 2Z&JL 


5 2 e 


E i a 


1 “ _ 


E i e 


E 2 a 


E 1 , E 2 = ( -In - \/ m2 + 2 a 2 A ) / a 2 

t t (0) and t 2 (0) of equations B.15 and B.16 could also be obtained 
as the negatives of the first and second derivatives of B.17 
at A=0. The inversion of equation B.17 to obtain the density 
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is not a simple procedure and could be of questionable 


* 

function 

value. 


*A Chernoff bound on the distribution is easily obtained and is 
of some value in the unsynchronized detection case. 
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In the synchronized case, on the other hand. 


v S | starting at oj = 


1 (exp 

2N(l-kp s )/k 2 p s (l-p s ) 

- 1 


2 (1-kp ) /k 2 p (1-p 

) 


+ N 


(B.21) 


which grows nearly exponentially with N, while 





k 2 p (1-p )] 2 


Var 

v S | starting at 0 


S s exp 

4(l-kp_) v 

4N(l-kp s )/k 2 p s (l-p s ) 


so that 

Var (v S ) . 1 

E 2 (v ) 
s 


(B . 23 ) 


From B.20, we see that for detection of unsynchronized operation, 
the normalized variance is very small for k/N, so that the mode 
is quite peaked. On the other hand, from (B.23), it follows, that 
for a false alarm when the system is synchronized, the mode is 
quite broad, reminiscent of either a Poisson or a Rayleigh dis- 
tribution. 

The basic assumption which justifies the continuous model is 
that k/N<<l. If this is not the case, the asymptotic formulas 
lose their validity and the numerical exact solution is required. 
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Application to Loss-of-Phase-Lock Indicators 


As an application of the above techniques to other than 
decoder synchronization, consider an unmodulated carrier or 
subcarrier tracking loop. A convenient measure of loop per- 
formance is the sign of cos <j> {t) where <J> (t) is the instantaneous 
phase error, and cos <J> (t) can be generated by multiplying the 
received signal by the quadrature VCO output and low pass fil- 
tering. Thus cos <}>(t)>0 implies 1 ( t ) | < tt/ 2, while cos <}> ( t ) < 0 
implies v/2 < | $ (t) | < it. 

Now suppose we sample this signal periodically, but with a 
sufficiently long period that successive samples are nearly in- 
dependent. The counter contents are incremented by k-1 when- 
ever the sample is negative and reduced by 1 whenever the sample 
is positive. When the phase-locked loop is properly tracking, 
the phase error probability density function (Ref. 3) is given 
exactly for a first-order loop, and approximately for higher 
order loops, by the expression 

acos<J> 

p ($) = f! ' — tt < <J» < u 

s 2 ttI o ( a ) “ “ 


where a = S/NoB L 

When the loop is out of lock, on the other hand 

P (<i>) = ir # £ * 

U Z 7T — — 
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Thus, clearly, using the notation of the previous sections. 


P 


u 



p u m d* = \ 


while 


p s ■ *£ /2 ’.<♦> ** 


and is given for various values of a in Table 1, which is extracte 
from Figure 4.7 of Reference 3. Thus even at 0 db, p u > 2p g so 
that the approach seems practically feasible. 


a 

p s 

6 db 

<.001 

3 db 

.09 

2 db 

.13 

1 db 

.18 

0 db 

.22 


Table B.l p g as a function of signal-to-noise ratio 
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APPENDIX C 


Overflow Probability in a Hybrid System 


The overall overflow probability of the system for an 
initially empty buffer is 


Po 


p. • Pr (overflow in a deleted frame) 
♦ 


| p /more than B 

" \ deleted fra] 


frames 
in K trials 


+ Pr (K>D) 


sequential decoding\ p 
lasts K frame times/ s ' ' 

(C.l) 


, p /sequential decoding of deleted frame\ 

s' ' r \lasts exactly K frame times / 

We have established in Section 5.3 that the computational 
distribution for deleted frames is upper bounded by the Pareto 

distribution divided by the deletion rate p , . 

<p 

Thus for 1000 bit frame, we shall use 

10 3 


Pr (K>J) _< 


p (10 3 Jp) p 


and consequently* 


P S (K) 


10 3 


(1-P) 


P.P P K P 


(C. 2) 


Since frames are deleted independently for a memoryless channel 


♦This is the only strictly valid bound which we can obtain from a 
bound on the cumulative distribution. Also this is the only upper 
bound which we shall use whose tightness is questionable. All 
further bounds are very tight, and all approximations can be changed 
into tight upper bounds by including appropriate additional terms 
which approach zero for very large buffer sizes. 
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p /more than -B sequential decoding^ 

(deleted frames lasts K frame times; 


■ ,.t ft) 


B ... . K-B KH (B/K) B.. .K-B ,, 

(1-p^) < e ' 'p^ (1-p^) (C. 3) 

/Til 


The inequality is a Chernoff bound for the binomial distribution 

provided B/K > 2p, and 

<P 

H (Y) * -YlnY-U-Y) In (1-Y) 


Thus inserting C.2 and C.3 in C.l, we obtain for the overall 


overflow (error) probability 


Pfl < 


£ e KH(B/K) (1 . p ,K-B 
K=B+l( /57b ' V 


B 10 3<1-P) ) IQSU-P) 


,3d-P) ! 


exp B JxH(l/x) + (x-l)/.n(l-p < j i )J p® 
__ _ __ 


(C.4) 


where x = K/B and B/D >> 2p 


Since K ranges over the integers and B>10, x increases in increments 
of 1/B<0 . 1 . Thus, we may accurately approximate the sum by an 
integral and obtain 


(1-p) 


Pn < 


> * xB_p H)’ <X " 1)B e ' r(x ' 1) - 


(C. 5) 


where y = -B£n(l-p.) > 0 


(C • 6 ) 
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Using the tight inequality. 




< e 


we obtain, 


10 3 


(1-p) 


P 0 5 


/•D/B 

/ 


. B B+y /*D/B 

2 + P * 6 ( „B-P ^-YX 

Ld P /Sir B p+ls 


e T * dx 


(C . 7 ) 


The integral is just the incomplete gamma function which is 
closely bounded by taking the complete integral 


Pn < 


10 3(1 " p) 


x , p B e B+Y r(B-p+i) 
+ /7JT B P +i 5 Y fi r P^- 


Now substituting (c.6) for y and noting that 




-B 


BB-c+1 “ B B-P + J[-f l |(l-p )]®-^ S <BP * )I ’ B ' 1(1 ' P * /2) B 


we obtain 


l 0 3(l- p ) 


Pn < 


LD 


pj" 1 e B T (B-p+1) 
/37 B B+3/2 (i-p./2) B 


(C . 8 ) 


Finally using Stirling's formula 


r (n+1) : 
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since B>>p, 


we obtain, using (1-p/B) B ” p *e” p 







n p-i 





P < 

10 3 (1-p) 


L. + _ 

P <j> 






M P 


D P B 

P+1 (i-p 

/ 2 ) B 

-» 



or since 

(1-P/2)-® 

- B P,/ 2 

~ e Y 








P < 

10 3 (1-p) 

1 

_ . 1 

b p/ 2 

e Y 

\ i 

J 

(C. 9) 


P o - 


Tvfijp T ^ 

B P* 

♦ / 


f 

provided 

B/D > 2p^, 

the first 

term being 

due to 

delay line over- 

flows and 

the second 

to buffer 

overflows. 

Thus 

the buffer 

size is 


effectively increased by the factor 1/p^ . 
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